reliability assessment of a system integrity protection
TRANSCRIPT
RELIABILITY ASSESSMENT OF A
SYSTEM INTEGRITY PROTECTION
SCHEME FOR TRANSMISSION
NETWORKS
A thesis submitted to The University of Manchester for the degree of
Doctor of Philosophy
in the Faculty of Science and Engineering
2017
by
Nan Liu
School of Electrical and Electronic Engineering
Page | 1
Contents
Contents ........................................................................................................................... 1
List of Figures .................................................................................................................. 6
List of Tables ................................................................................................................. 10
List of Abbreviations .................................................................................................... 12
Abstract .......................................................................................................................... 15
Declaration ..................................................................................................................... 16
Copyright Statement ..................................................................................................... 17
Acknowledgement ......................................................................................................... 18
Publications .................................................................................................................... 19
CHAPTER 1
INTRODUCTION ......................................................................................................... 20
1.1. Power System Reliability ................................................................................................ 20
1.2. Project Motivation and Objectives .................................................................................. 24
1.3. Contributions ................................................................................................................... 27
1.4. Outline of the Thesis ........................................................................................................ 29
CHAPTER 2
RELIABILITY OF SYSTEM INTEGRITY PROTECTION SCHEME ................ 32
2.1. Introduction of System Integrity Protection Scheme ....................................................... 32
2.1.1. SIPS Applications .................................................................................................... 34
2.1.2. SIPS Classification ................................................................................................... 36
2.2. SIPS Design Consideration ............................................................................................. 38
2.2.1. Initiating Conditions ................................................................................................. 39
2.2.2. Time Requirements .................................................................................................. 39
2.2.3. Redundancy Consideration ...................................................................................... 40
2.3. SIPS: Industry Experience ............................................................................................... 41
Contents
Page | 2
2.3.1. SIPS Applications and Maloperations ...................................................................... 41
2.3.2. SIPS Reliability Criteria ........................................................................................... 44
2.4. Existing SIPS Applications ............................................................................................. 46
2.4.1. Dinorwig Intertrip Scheme ....................................................................................... 46
2.4.2. PacifiCorp’s Jim Bridger RAS ................................................................................. 49
2.4.3. Southern California Edison Centralised RAS .......................................................... 51
2.5. Review of Major SIPS Maloperations ............................................................................. 54
2.5.1. Irish System Disturbance, 5th August 2005 .............................................................. 54
2.5.2. SIPS Maloperation in Nordic Grid, 1st of December 2005....................................... 56
2.6. Summary .......................................................................................................................... 59
CHAPTER 3
ASSESSING THE IMAPCT OF ICT RELIABILITY ON SIPS APPLICATION 61
3.1. The Role of ICT in Power System Protection ................................................................. 61
3.1.1. Impact of ICT on Power System Protection ............................................................. 62
3.1.2. Impact of ICT on SIPS ............................................................................................. 64
3.2. Communication Infrastructure of SIPS ........................................................................... 65
3.2.1. General SIPS Communication Infrastructures ......................................................... 65
3.2.2. Wide Area Communication Network ....................................................................... 67
3.2.3. Substation Automation System ................................................................................ 68
3.2.4. Centralised SIPS: Speed Requirement ..................................................................... 69
3.3. IEC 61850 based Substation Automation System and its Reliability Model .................. 71
3.3.1. IEC 61850 based Substation Station Bus Architectures........................................... 72
3.3.2. IEC 61850-9-2 based Process Bus Architectures ..................................................... 75
3.3.3. Reliability Model of the Substation Automation System ......................................... 76
3.3.4. Reliability Data ........................................................................................................ 80
3.4. Reliability Assessment of SAS Communication Services ............................................... 81
3.4.1. Reliability of Two-terminal Communication ........................................................... 82
3.4.2. Reliability of Multi-Terminal Communication ........................................................ 85
3.4.3. Sensitivity Analysis .................................................................................................. 89
3.5. Summary .......................................................................................................................... 91
CHAPTER 4
PROTECTION AND CONTROL ASSET END-OF-LIFE ANALYSIS ................. 93
Contents
Page | 3
4.1. Introduction ..................................................................................................................... 93
4.1.1. Literature Review on End-of-Life Assessment ........................................................ 95
4.1.2. Asset Life Extension (ALE) Project Test Process .................................................... 99
4.1.3. Benefits and Risks of Asset Life Extension ........................................................... 101
4.2. UK National Grid Asset Life Extension Project ............................................................ 102
4.2.1. National Grid Protection and Control Asset Life Extension (ALE) Project .......... 102
4.2.2. Asset Life Extension (ALE) Study of Selected Protection Relays ......................... 103
4.2.3. Relay Defect Data Analysis ................................................................................... 106
4.2.4. Environment Influence ........................................................................................... 107
4.3. Laboratory Evaluation Results on Selected Relays ....................................................... 108
4.3.1. Laboratory Inspection ............................................................................................ 109
4.3.2. Fingerprint Performance Testing............................................................................ 112
4.3.3. Stress Testing-Simulated In-service Conditions .................................................... 120
4.3.4. In-Depth Evaluation of Modules and Components ................................................ 122
4.3.5. System Level Failure Mode, Mechanism and Effect Analysis .............................. 129
4.4. Conclusions and Future Works ...................................................................................... 129
4.4.1. Recommendations .................................................................................................. 130
4.4.2. Further Work and Application to other Equipment Types ..................................... 132
4.5. Summary ........................................................................................................................ 132
CHAPTER 5
RISK ASSESSMENT OF A SYSTEM INTEGRITY PROTECTION SCHEME 134
5.1. Literature Review of SIPS Reliability Assessment Method .......................................... 134
5.2. SIPS Risk Assessment Procedures ................................................................................ 136
5.2.1. Reliability Assessment ........................................................................................... 137
5.2.2. Impact Assessment ................................................................................................. 139
5.2.3. Risk Assessment ..................................................................................................... 140
5.3. SIPS Communication Infrastructure Modelling ............................................................ 140
5.3.1. Introduction of Studied SIPS Communication Architectures ................................ 140
5.3.2. Communication System Modelling ........................................................................ 142
5.4. SIPS Reliability Assessment ......................................................................................... 144
5.4.1. Failure Mode and Effect Analysis .......................................................................... 144
5.4.2. Markov Modelling .................................................................................................. 146
5.4.3. Reliability Block Diagram...................................................................................... 148
Contents
Page | 4
5.4.4. Reliability Assessment Results .............................................................................. 149
5.5. Risk Assessment Numerical Illustration: Analytical Method ........................................ 150
5.5.1. GRS Operating Logic ............................................................................................. 150
5.5.2. Analytical Risk Assessment Procedures ................................................................ 151
5.5.3. Analytical Risk Assessment Results ...................................................................... 155
5.6. Sensitivity Study ............................................................................................................ 156
5.6.1. Impact of Component Reliability on GRS Risk ..................................................... 157
5.6.2. Impact of System Conditions on GRS Risk ........................................................... 159
5.7. Summary ........................................................................................................................ 161
CHAPTER 6
RISK OF IMPLEMENTING SIPS IN A SYSTEM WITH LARGE-SCALE WIND
INTEGRATION .......................................................................................................... 162
6.1. Future UK Power System .............................................................................................. 162
6.1.1. Future Energy Scenarios and Wind Generation ..................................................... 163
6.1.2. Load Profiles .......................................................................................................... 164
6.1.3. Transmission Line Reinforcements ........................................................................ 165
6.2. Stochastic Risk Assessment Procedures ........................................................................ 166
6.3. System Condition Time-series Model ........................................................................... 170
6.3.1. Wind Forecast Model ............................................................................................. 170
6.3.2. Power System Load Profile .................................................................................... 172
6.4. Numerical Illustration of Stochastic SIPS Risk Assessment ......................................... 173
6.5. Stochastic Risk Assessment Results .............................................................................. 175
6.6. Comparison between Local GRS and System Wide GRS ............................................. 178
6.7. Impact of Variation in Wind Level on Risk Assessment Results .................................. 179
6.8. Summary ........................................................................................................................ 181
Chapter 7
MANAGING THE RISK OF SIPS IN POWER SYSTEM LONG-TERM
PLANNING ................................................................................................................. 183
7.1. Introduction of electric system planning with SIPS ...................................................... 183
7.1.1. Electric system long-term planning with SIPS ....................................................... 184
7.1.2. Challenges in SIPS Coordination ........................................................................... 185
Contents
Page | 5
7.2. Risk Assessment Methodologies Considering SIPS Interaction ................................... 187
7.2.1. Description of the System-level Multi-state Markov Model .................................. 188
7.2.2. Modified Impact Assessment Procedure ................................................................ 191
7.3. Method Numerical Illustration ...................................................................................... 192
7.3.1. Reliability Assessment Results .............................................................................. 194
7.3.2. Impact Assessment Results .................................................................................... 196
7.3.3. Risk Assessment Results ........................................................................................ 198
7.4. System Planning Incorporating SIPS............................................................................. 201
7.5. Sensitivity Study ............................................................................................................ 203
7.6. Managing the SIPS Risk Using Adaptive SIPS ............................................................. 205
7.7. Summary ........................................................................................................................ 209
CHAPTER 8
CONCLUSIONS AND FUTURE WORK ................................................................ 212
8.1. Introduction ................................................................................................................... 212
8.2. Conclusions ................................................................................................................... 213
8.3. Future Work ................................................................................................................... 219
References .................................................................................................................... 222
Appendix A: Protection Fingerprint Testing ........................................................... 228
A.1 Line Parameters and Protection Settings ....................................................................... 228
A.2 PSCAD model used for dynamic fault based distance relay testing .............................. 231
Appendix B: Vulnerable Components Assessment .................................................. 232
B.1 Vulnerable Components Examined via X-ray Tomography .......................................... 232
B.2 Component Degradation Mechanisms ........................................................................... 233
B.3 Structural Investigation via 3D X-Ray Microtomography ............................................. 235
Appendix C: IEEE Reliability Test System Load Profile ....................................... 237
Appendix D: Reliability Assessment Results for a system with two SIPS ............. 239
Word count: 64,761
Page | 6
List of Figures
Figure 2-1: General Structure of System Integrity Protection Scheme........................... 34
Figure 2-2: SIPS Design Process .................................................................................... 39
Figure 2-3: System Integrity Protection Scheme Typical Operating Times ................... 40
Figure 2-4: SIPS Maloperations and Causes from 2000 to 2009 NERC Reports ........... 44
Figure 2-5: One Line Diagram of North Wales Supergrid ............................................. 47
Figure 2-6: Line Outage Detection Logic used in GRS .................................................. 49
Figure 2-7: Geographic Overview of PacifiCorp’s Jim Bridger Transmission System . 50
Figure 2-8: Jim Bridger RAS Triple Modular Redundant (TMR) System ..................... 51
Figure 2-9: The Existing and Forecasted RASs in SCE’s Service Territory .................. 52
Figure 2-10: SCE CRAS High-level Network Architecture ........................................... 53
Figure 2-11: The Ireland Transmission System Map ...................................................... 55
Figure 2-12: Frequency change during Irish Disturbance on 5th August 2005 ............... 56
Figure 2-13: Nordic Grid and the Protection Schemes ................................................... 58
Figure 3-1: Supervision of Backup Relays to Prevent Zone3 Maloperation .................. 63
Figure 3-2: General SIPS Architecture with Central Processors .................................... 66
Figure 3-3: WAN SONET Architecture ......................................................................... 67
Figure 3-4: Substation Automation Architecture from Hardwire to IEC 61850 ............ 69
Figure 3-5: Time Breakdown of a Time-Critical SIPS Application ............................... 70
Figure 3-6: Star (left) & Ring (right) Type SCN Architectures ...................................... 73
Figure 3-7: Example of IEC 62439-3 HSR Network ...................................................... 73
Figure 3-8: Redundant Double-Star (left) & Double-Ring (right) SCN Architectures .. 74
Figure 3-9: Two Process Bus Sensor Network Architectures ......................................... 76
Figure 3-10: Basic Two-Component System in (a) Series and (b) Parrallel................... 77
Figure 3-11: 4-State Markov Model ............................................................................... 78
Figure 3-12: Reliability Block Diagram of different SAS Architectures for Reporting
Service ............................................................................................................................. 83
Figure 3-13: MTTF & Cost of Considered SCN Architectures ...................................... 84
Figure 3-14: Breaker Failure Protection for Different Station Arrangements ................ 86
Figure 3-15: Communication Path of Arch7 for Distributed Function ........................... 87
Figure 3-16: Reliability of SAS to Perform Multi-Terminal Communications ............. 88
List of Figures
Page | 7
Figure 3-17: Impact of MTTF on System Unreliability for Arch 1 ................................ 90
Figure 4-1: Bathtub Curve for End-of-life Assessment .................................................. 95
Figure 4-3: ALE Project Investigation Process ............................................................. 101
Figure 4-2: UK National Grid Relay Age Distribution (by the end of 2014) ............... 105
Figure 4-4: SHNB Relay (left) and Its Comparator Module PCB (right) ..................... 110
Figure 4-5: THR Relay (left) and Its Internal PCBs (right) .......................................... 111
Figure 4-6: LFCB Relay (left) and Its Internal PCBs (right) ........................................ 112
Figure 4-7: Static Fault based Distance Relay Testing in Omicron ‘Distance Relay’
Module .......................................................................................................................... 113
Figure 4-8: LFCB Dual Slope Bias Characteristics ...................................................... 114
Figure 4-9: Connections for LFCB Bias Charateristic Testing ..................................... 115
Figure 4-10: Thermal Images and Components within LFCB Power Supply Module 124
Figure 4-11: Thermal Images on Components within Modules 2&3 (Relay Outputs 1&2)
....................................................................................................................................... 124
Figure 4-12: X-Ray Tomography Images of LFCB Voltage Regulator IC14, Module 4:
....................................................................................................................................... 127
Figure 4-13: X-Ray Tomography Images of Voltage Regulator IC23, 15-year Old Relay
....................................................................................................................................... 127
Figure 4-14: Acoustic Microscopy Images showing the Evolution of Degradation in a
TO-220 Package Die Attachment during Thermal Cycling .......................................... 128
Figure 5-1: SIPS Reliability Assessment Procedures ................................................... 137
Figure 5-2: Protection and Communication Architecture of a GRS. ............................ 141
Figure 5-3: RBD to Assess the Depededability (a) and Security (b) of the Substation
Sensor Network ............................................................................................................. 143
Figure 5-4: Communication Path for Multicast GOOSE in PRP based Double-Ring .. 143
Figure 5-5: RBD to Assess the Depededability (a) and security (b) of the PRP Ring
LAN .............................................................................................................................. 144
Figure 5-6: RBD for SONET WAN (a) Dependability and (b) Security ...................... 144
Figure 5-7: Markov Model for SIPS Component Reliability Assessment .................... 146
Figure 5-8: 3-Bus System with Generator Rejction Scheme (GRS) ............................. 150
Figure 5-9: Fault Tree Analysis (FTA) to Assess the Probability of GRS DBM ......... 155
Figure 5-10: GRS Risk Assessment Results for Different Sensor Network Architectures
....................................................................................................................................... 155
List of Figures
Page | 8
Figure 5-11: GRS Risk Comparison with and without Intertripping (I/T) Signal ........ 156
Figure 5-12: Impact of MTTF and MTTR on Risks of Different GRS Designs .......... 158
Figure 5-13: Impact of Reliability of each GRS Phase on Overall Risks for Local GRS
....................................................................................................................................... 159
Figure 5-14: Impact of Critical Line Outage Rate on GRS Risks ................................ 160
Figure 5-15: Impact of Load Level on GRS Risks ....................................................... 160
Figure 6-1: Gone Green Transmission Generation Mix ............................................... 164
Figure 6-2: Variation in Daily Load Profile for Different Energy Scenario ................. 165
Figure 6-3: Risk Assessment Procedure using SMCS .................................................. 168
Figure 6-4: Procedures to Produce Times-Series Wind Farm Output Data .................. 170
Figure 6-5: Wind Speed Data Distribution and Wind Turbine Model ......................... 172
Figure 6-6: IEEE RTS Yearly Load Profile .................................................................. 172
Figure 6-7: IEEE 24-Bus Reliability Test System with GRS Logic ............................. 174
Figure 6-8: Comparion between System Risks with and without GRS ........................ 175
Figure 6-9: Simulated and Histroical Wind Speed Data Probability Density Function 176
Figure 6-10: Coefficient of Variation in SIPS Risk with Simulation Hours................. 176
Figure 6-11: Annual Risks Induced by Different GRS Designs ................................... 177
Figure 6-12: Comparison between a Local GRS and a System-Wide GRS ................. 179
Figure 6-13: Monthly Average Wind Speed Variation over 100 years ........................ 180
Figure 6-14: GRS Risks under Various Average Monthly Wind Levels...................... 181
Figure 7-1: Conceptual Relationship between SIPS Number and System Operational
Risks [96] ...................................................................................................................... 186
Figure 7-2: Risk Assessment Procedure Considering SIPS Interactions ...................... 187
Figure 7-3: System-level Markov Model to Assess Interaction between Two SIPS .... 188
Figure 7-4: Simplified System-level Markov Model for a System with Multiple SIPS
....................................................................................................................................... 189
Figure 7-5: Modified PJM 5-bus System with Wind Farms ......................................... 193
Figure 7-6: Impact Assessment Results under Various System Conditions: (a) Impcat of
DBM related States. (b) Impact of SBM related States. ............................................... 198
Figure 7-7: Annual Risk Induced by Different Local SIPS Designs ............................ 199
Figure 7-8: Annual Risk Induced by Different System-Wide Centralised SIPS Designs
....................................................................................................................................... 200
Figure 7-9: Variation in Risks (Arch4 voting) with Number of SIPS in System ......... 201
List of Figures
Page | 9
Figure 7-10: PJM 5-Bus System with Transmission Expansion ................................... 202
Figure 7-11: Variation in SIPS risks in a planning horizon of 25 years ....................... 203
Figure 7-12: Impact of Variation in Reliability Data on SIPS Risks ............................ 204
Figure 7-13: Adaptive SIPS using WAMPC Platform ................................................. 206
Figure 7-14: Variation in SIPS Risks under Different System Conditions ................... 207
Figure 7-15: Variations in SIPS Risks in Three Typical Days ..................................... 208
Figure 7-16: Operational Logics of Adaptive SIPS during a Day for each Scenario ... 209
Figure A-1: PSCAD Model used for Dynamic Fault based Distance Relay Testing ... 231
Figure B-1: Typical Plastic Encapsulated Transistor Package ..................................... 233
Figure B-2: Internal Structure of HMOS Microcontroller Identified as Operating above
Ambient Temperature in SHBM Module 16 ................................................................ 235
Figure B-3: Metal can packaged voltage regulator (IC11, Modules 21/23/25, SHNB 101)
....................................................................................................................................... 236
Page | 10
List of Tables
Table 2-1: SIPS Categories by Type of Corrective Actions ........................................... 35
Table 2-2: SIPS Survey Results ...................................................................................... 41
Table 2-3: SIPS Failures Recorded by NERC from 1986 to 1995 ................................. 43
Table 2-4: ISA and IEC Defined Safety Integrity Level (SIL) ....................................... 45
Table 2-5: Spurious Trip Level (STL) in terms of P(SBM) and STR ............................ 46
Table 3-1: Grace Time for Substation Automation Systems .......................................... 71
Table 3-2: Reconfiguration Time for Common Redundancy Protocols ......................... 75
Table 3-3: Substation Component Reliability Data ........................................................ 81
Table 3-4: Reliability Assessment Results for Reporting Service .................................. 84
Table 3-5: Reliability Data for Conducting Distributed Functions ................................. 88
Table 3-6: RRW of each component in Arch 1&8 ......................................................... 90
Table 4-1: UK National Grid Policy on Relay Lifetime ............................................... 104
Table 4-2: Relay Population and Anticipated Lifetime ................................................ 105
Table 4-3: Maloperations for each Relay Type from 2000-2013 ................................. 106
Table 4-4: Causes of Relay Maloperations ................................................................... 106
Table 4-5: Summary of Ambient Temperatures Recorded over a Period of One Year 108
Table 4-6: Relay Samples used for Laboratory Testing................................................ 109
Table 4-7: Fingerprint Testing Results for Static Faults ............................................... 116
Table 4-8: Fingerprint Testing Results for Dynamic Faults ......................................... 116
Table 4-9: LFCB 103 (208284J) (9 years in-service time) Testing Results ................. 118
Table 4-10: LFCB 103 (547373C) (16 years in-service time) Testing Results ............ 118
Table 4-11: Alstom P545 Differential Characteristic Testing Results ......................... 118
Table 4-12: Ratings and Assessed Overload Capabilities of Protective Relays ........... 120
Table 4-13: THR PS10 Power Supply Unit Components and Voltage Stress .............. 122
Table 4-14: Thermal Imaging of LFCB Relay and Examined Hot Components ......... 125
Table 4-15: Recommended Relay Lifetime based on Evaluation Results .................... 130
Table 5-1: Substation based Sensor Network Reliability Assessment Results ............. 149
Table 5-2: LAN and WAN Reliability Assessment Results ......................................... 149
Table 5-3: Generation Data of the 3-bus System .......................................................... 150
Table 5-4: Impact Assessment for GRS Misoperation ................................................. 154
List of Tables
Page | 11
Table 5-5: Entry Point for GRS Risk to Reach below 1$/hr ......................................... 159
Table 6-1: Risk Assessment Results for Different GRS Designs ................................. 178
Table 7-1: Probability of each Operational State in a System with Two SIPS ............. 195
Table 7-2: Variation in the Probability of Interactions between SIPS for Arch4(voting)
....................................................................................................................................... 195
Table 7-3: Impact Assessment Data of Different SIPS Operation ................................ 196
Table 7-4: System Production Cost and Wind Curtailment with Simulation Year ...... 203
Table C-1: Weekly Peak Load in Percent of Annual Peak ........................................... 237
Table C-2: Daily Load in Percent of Weekly Peak ....................................................... 237
Table C-3: Hourly Peak Load in Percent of Daily Peak ............................................... 238
Table D-1: Reliability Assessment Results for SIPS Interaction .................................. 239
Page | 12
List of Abbreviations
AHI Asset Health Index
ALE Asset Life Extension
ARMA Auto-regressive and Moving Averages
BFP Breaker Failure Protection
BPU Bay Protection Unit
CB Circuit Breaker
C-GRS Centralised Generator Rejection Scheme
CRAS Centralised Remedial Action Scheme
CT Current Transformer
DANH Doubly Attached Node running High-availability Seamless Ring
DBM Dependability-based Maloperation
DG Distributed Generation
EENS Expected Energy Not Served
EM Ethernet Media
EMI Equipment Modification Instruction
EMS Energy Management System
ESW Ethernet Switch
ETYS Electricity Ten-year Statement
EV Electric Vehicle
FACTS Flexible Alternating Current Transmission System
FMEA Failure Mode and Effect Analysis
FMMEA Failure Mode, Mechanism and Effect Analysis
FTA Fault Tree Analysis
GOOSE Generic Object Oriented Substation Event
GPS Global Positioning Satellite
GRS Generator Rejection Scheme
HMI Human Machine Interface
HSR High-availability Seamless Ring
HVDC High Voltage Direct Current
IC Integrated Circuit
ICT Information and Communication Technology
List of Abbreviations
Page | 13
IED Intelligent Electronic Device
IEEE The Institute of Electrical and Electronics Engineers
IGMP Internet Group Management Protocol
LAN Local Area Network
LOWG Loss of Wind Generation
MTTF Mean Time to Failure
MTTR Mean Time to Repair
MU Merging Unit
NCC National Control Centre
NERC The North American Electric Reliability Corporation
OTS Operational Tripping Scheme
PCB Printed Circuit Board
PDF Probability Density Function
PLC Programmable Logic Controller
PMU Phasor Measurement Unit
PPI Protection Performance Information
PRP Parallel Redundancy Protocol
RAS Remedial Action Scheme
RBD Reliability Block Diagram
RoCoF Rate of Change of Frequency
RRW Risk Reduction Worth
RSTP Rapid Spanning Tree Protocol
RTS Reliability Test System
RTU Remote Terminal Unit
SAN Singly Attached Node
SAS Substation Automation System
SBM Security-based Maloperation
SCADA Supervisory Control and Data Acquisition
SCE The Southern California Edison
SCN Substation Communication Network
SDH Synchronous Digital Hierarchy
SIL Safety Integrity Level
SIPS System Integrity Protection Scheme
List of Abbreviations
Page | 14
SMCS Sequential Monte Carlo Simulation
SOF System Operability Framework
SPS Special Protection Scheme
STP Spanning Tree Protocol
STR Spurious Trip Reduction
SW Switch
TMR Triple Modular Redundant
TS Time Source
UFLS Under Frequency Load Shedding
VOLL Value of Lost Load
VT Voltage Transformer
WAMS Wide Area Monitoring System
WAN Wide Area Network
WAMPAC Wide Area Monitoring Protection and Control
WECC The Western Electricity Coordinating Council
WF Wind Farm
WTG Wind Turbine Generator
Page | 15
Abstract
Reliability Assessment of a System Integrity Protection Scheme for Transmission Networks
Candidate: Nan Liu Institute: The University of Manchester
Degree: Doctor of Philosophy Date: August 2017
System Integrity Protection Schemes (SIPS) are being applied to power networks to
minimize the probability of large system disturbances and to cope with the growing size
and complexity of modern Power Systems. SIPS offer a timely and economical solution
which enhances the transmission capability whilst postponing the need for new
transmission facilities. However, recent SIPS related incidents reveal that SIPS
maloperations could contribute to the spread of the system disturbance and expose the
Power System to additional risks. In particular, the use of advanced Information and
Communication Technologies (ICT) in SIPS, along with the continuously ageing
protection assets used in the current GB National Grid, raises major concern in the
reliable operation of SIPS.
The aim of this thesis is to provide an insight into the reliability of the protection
schemes in the transmission network and develop investigation methods to
quantitatively assess the risk brought by SIPS. Probabilistic techniques have been
developed to identify the optimal SIPS design in the ICT infrastructures and operational
logic, which delivers the most reliable performance and the minimal risk to system
operation.
A method based on reliability block diagram is proposed to assess the impact of ICT
failures on the communication services in an IEC 61850 based substation automation
system. In addition, an investigation process based on function tests and invasive
examination is developed to evaluate the operational condition of the commonly used
electronic protection relay types that are approaching their predefined end of service life.
The investigation results help ensure the reliable and fast automatic protection function
against fast developing system incidents.
The risks brought by SIPS operation is studied using both analytical and stochastic
methods. A risk assessment platform based on Sequential Monte Carlo Simulation
(SMCS) is developed to capture the time-series feature of the system conditions and
assess the variation in SIPS operational risk. This thesis also describes a generic
framework of using multi-level Markov models to quantify the probability of
undesirable interactions between SIPS on the same or neighbouring systems. The
simulations results indicate that, with a widespread proliferation of SIPS, uncoordinated
SIPS operations lead to severe impact on Power System reliability. The use of adaptive
SIPS, which adjust its protection logics to the increasingly variable system condition,
could effectively mitigate the operational risk.
Page | 16
Declaration
No portion of the work referred to in the thesis has been submitted in support of an
application for another degree or qualification of this or any other university or other
institute of learning.
Page | 17
Copyright Statement
i. The author of this thesis (including any appendices and/or schedules to this
thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he
has given The University of Manchester certain rights to use such Copyright,
including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or
electronic copy, may be made only in accordance with the Copyright, Designs
and Patents Act 1988 (as amended) and regulations issued under it or, where
appropriate, in accordance Presentation of Theses Policy You are required to
submit your thesis electronically Page 11 of 25 with licensing agreements which
the University has from time to time. This page must form part of any such
copies made. iii. The ownership of certain Copyright, patents, designs, trademarks and other
intellectual property (the “Intellectual Property”) and any reproductions of
copyright works in the thesis, for example graphs and tables (“Reproductions”),
which may be described in this thesis, may not be owned by the author and may
be owned by third parties. Such Intellectual Property and Reproductions cannot
and must not be made available for use without the prior written permission of
the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and
commercialisation of this thesis, the Copyright and any Intellectual Property
and/or Reproductions described in it may take place is available in the
University IP Policy (see
http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=2442 0), in any
relevant Thesis restriction declarations deposited in the University Library, The
University Library’s regulations (see
http://www.library.manchester.ac.uk/about/regulations/) and in The University’s
policy on Presentation of Theses
Page | 18
Acknowledgement
First and foremost, I would like to express my gratitude to my supervisor, Prof. Peter
Crossley, for his altruistic supervision, invaluable guidance and continuous support
throughout my PhD research. I really appreciate his helpful comments and discussions
which have contributed a lot to this achievement.
I would like to acknowledge School of Electrical and Electronic Engineering, the
University of Manchester for providing the financial support during my PhD studies.
I would also like to thank my colleagues and friends in the Ferranti Building. Thanks to
Dr Mathaios Panteli and Dr Zhihui Dai for their technical advice and cooperation in
publishing joint papers. I would like to acknowledge Dr Bryan Gwyn, Dr Eric Udren
and Dr Solveig Ward from Quanta Technology, Dr Pearl Agyakwa and Dr Martin
Corfield from the University of Nottingham. The project would not have been
successful without their invaluable advice. Thanks to my friend Yipeng Wang, for her
understanding and support during the stressful moments throughout my PhD.
Finally, I would like to express my deepest gratitude to my family, especially my
parents and grandparents. Thank you for your selfless support, tolerance, trust and
unconditional love throughout my life. I hope that you are happy for this achievement,
because every moment in this journey you are always standing by my side.
Page | 19
Publications
1. N. Liu and P. Crossley, "Assessing the Risk of Implementing System Integrity
Protection Schemes in a Power System with Significant Wind Integration," IEEE
Transactions on Power Delivery, Volume: PP, Issue: 99, 2017.
2. Z. Dai, P. Crossley, N. Liu and X. Liu, “Probabilistic Identification Method of
Distance Protection Misoperation due to Power Flow Transfer,” Int. Trans. on Electr.
Energ. Syst., Volume 27, Issue 3, March, 2017. (Journal paper)
3. N. Liu and P. Crossley, "Risk assessment of a generator rejection scheme
implemented in a wind farm," in 2016 IEEE Power and Energy Society General
Meeting (PESGM), 2016, pp. 1-5. (Conference paper, oral and poster)
4. N. Liu, M. Panteli, and P. A. Crossley, "Risk assessment of an IEC 61850 based
substation communication network in a system integrity protection scheme," in IET
International Conference on Resilience of Transmission and Distribution Networks
(RTDN) 2015, 2015, pp. 1-6. (Conference paper and oral)
5. N. Liu, M. Panteli, P. A. Crossley. “Reliability Evaluation of an All-digital System
Integrity Protection Scheme”, in 2015 PAC World Conference, Glasgow, Scotland,
29 Jun – 02 Jul, 2015. (Conference paper, oral and poster)
6. N. Liu, M. Panteli, and P. A. Crossley, "Reliability evaluation of a substation
automation system communication network based on IEC 61850," in 12th IET
International Conference on Developments in Power System Protection (DPSP
2014), 2014, pp. 1-6. (Conference paper and oral)
7. N. Liu, X. Wang, P. A. Crossley. “Impact of Harmonics on Overcurrent Protection
Relays”, in the 5th International Conference on Advanced Power System Automation
and Protection (APAP), Jeju, South Korea, 28-31 October 2013. (Conference paper
and oral)
8. M. Kuflom, P. A. Crossley, and N. Liu, "Impact of pecking faults on the operating
times of numerical and electromechanical over-current relays," in 13th International
Conference on Development in Power System Protection 2016 (DPSP), 2016, pp. 1-6.
(Conference paper and oral)
Page | 20
CHAPTER 1
INTRODUCTION
1.1. Power System Reliability
An electrical power network is designed for the transmission and distribution of
electricity and is required to provide an uninterrupted and high quality “main” supply to
all residential, commercial and industrial customers. Criteria in system design, planning
and operation, that incorporate existing and new technologies, have been developed
over decades to enhance the reliable and economic operation of all Power Systems, but
especially those in the developed world. Due to the continuous changes in loads, types
of generation and other key operating parameters (e.g. system inertia and rate of change
of frequency (RoCoF), etc.), the operation and protection of many Power Systems are
becoming extremely difficult and complex.
Power Systems consist of a large number of components and infrastructures spread over
a wide geographical area, failures in any part of the system could cause interruptions
affecting an area involving a small number of residents up to a national network,
resulting in widespread disruption of supply, in extreme cases the failure might black-
out a region, country or union of countries, such as the EU. Historically, catastrophic
failures in power system have occurred throughout the years [1]. In recent years, the
frequency of these events continues to increase, partly due to the complex environment
brought by the deregulation of the power industry. In addition, the economic penalties
Chapter 1: Introduction
Page | 21
associated with such events have become ever more severe as modern society becomes
increasingly more reliant on the availability of a high-quality power supply. Power
Systems have evolved with continuing growth in demand, significant deployment of
renewables and increasingly use of interconnected networks and all these factors have
brought additional stress to the electrical network, resulting in lines and other electrical
components being operated more frequently closer to their operating limits.
Transmission system operators must continuously deal with the challenges to reduce the
intensity and severity of system disturbances and maintain Power System reliability.
Reliability is defined by Bazovsky as “the probability of a device performing its
purpose adequately for the period of time intended under the operating conditions
encountered” [2]. When applied to Power System, it is defined as: -
“(System) Reliability is a general term encompassing all the measures of the
ability of the system to deliver electrical energy to all points of utilization
within acceptable standards and in the amounts desired [3].”
Reliability of Power System can be evaluated by considering two functional aspects [4]:
Adequacy: is the existence of sufficient facilities within the system to satisfy the
consumer demand. It includes sufficiency in the generated energy and the associated
transmission and distribution networks required to transport the energy to the
customers in the long term. Adequacy is evaluated under the static conditions of the
Power System without considering system disturbances.
Security: is related to the ability of the system to respond to disturbances arising
within the system. It is therefore associated with the response of the system to its
subjected disturbances.
The necessity of maintaining the reliability of Power System has always been
recognized by Power System managers, designers, planners, and operators. Redundancy
in generation, transmission and distribution facilities are built in to ensure the adequacy
and continuity of the power supply, especially in the event of failures and system
outages. In addition, criteria in system operation and planning have been developed to
ensure reliable overall system capabilities. According to [5], the criteria and techniques
first used in practise were all determinatively based and were required to fulfil the
following aspects:
Chapter 1: Introduction
Page | 22
a) Planning Generating Capacity: The installed generating capacity must equal the
maximum demand plus a certain percentage of reserve. Historically, the period with
the highest load was used to assess the adequacy of the generating capacity. However,
with more intermittent renewable generation in the system and less conventional
fossil fuel generation, critical situations may occur when the generation output from
the renewables is low. Consequently, the variation in renewables needs to be
considered when assessing the sufficiency in generating capacity.
b) Operating Capacity: Accurate control of system voltage and frequency is an
important aspect of stable and secure system operation. Individual generators are
scheduled and dispatched to satisfy the constant changing load demand, and keep the
frequency within acceptable limits. Reserves in spinning capacities are required to
cope with the loss of major generator units. In addition, system voltages need to be
maintained within a secure range and this is achieved by adjusting reactive power
sources, such as generators and capacitor banks. Automatic voltage regulators are
built into the generators to control the voltage to scheduled levels.
c) Planning Network Capacity: The power flow on transmission and distribution lines,
transformers and other current-carrying devices needs to be monitored to ensure the
thermal limits are not exceeded. In addition, the system needs to be operated reliably
during a system contingency, which might be the loss of a key generator or a
transmission line. This is known as the “N-1 Criterion”. The criterion specifies that
the Power System shall continue to operate in its normal operational state following
the loss of one generation unit, transmission line or transformer. Accordingly,
network defensive strategies need to be developed based on the assumption that the
equipment can and will fail unexpectedly. Both short-term planning (e.g. day-ahead
and week-ahead) and long-term planning are required to provide adequate generation
and transmission capacities to prevent widespread and uncontrolled cascading
outages during severe contingencies.
Although reliability criteria have been developed, considering randomly occurring
failures in a Power System, most of them are inherently deterministic and cannot reflect
the stochastic nature of system behaviour, customer demand and component failures.
For example, deterministic analysis can consider the impact of hazards leading to a
dangerous state or system failure. However, a hazard, even if extremely undesirable,
could be of little consequence if it is very unlikely to occur. Therefore, planning based
Chapter 1: Introduction
Page | 23
on such hazard analysis will lead to overinvestment [6]. Consequently, probabilistic
reliability assessment methods are required to combine both severity and likelihood of
the event to reflect its true system risk. The objective of reliability evaluation is further
explained as:-
“to indicate how a system may fail, the consequences of failures and also to
provide information to enable engineers and managers to relate the quality of
their system to economics and capital investments. In so doing it can lead to
better and more economic designs, and a much improved knowledge of the
operation and behaviour of a system [5].”
A wide range of probabilistic techniques have been developed to assess the reliability of
a Power System. The general industry practice for reliability assessment can be
performed using two main approaches: analytical and simulation [7]. The analytical
approach represents the system using a mathematical model and analyses its
performance under a set of “normal contingencies” selected on the basis of the
likelihood of occurrence. System behaviours are evaluated by assessing the reliability
indices from the analytical model using mathematical solutions. The analytical approach
has been widely applied in industry to help ensure reliable system operation with
relatively low computing effort. However, it is difficult to incorporate the various
operational states of a complex system or complex operating logics of emergency
control into the analytical method. The conditional probabilities the various operational
states of a complex system will be extremely difficult to estimate. Additionally, this
method has limitation in dynamically assessing the impact of emergency control actions
on the system.
The simulation approach estimates the reliability of the system by simulating the
stochastic nature of the system conditions and events, and uses this to quantify the risk.
This method can theoretically map all the contingencies and failures inherent in the
planning, design and operation process into the reliability model. These include random
system events such as outages and repairs of Power System elements, dependent events,
component behaviours, load and generation variations as well as operating policies.
Chapter 1: Introduction
Page | 24
1.2. Project Motivation and Objectives
During the past few decades, considerable progress has been made in Power System
reliability modelling, and quantitative analysis based on probabilistic theory has been
applied to power system reliability assessment. Reliability of protection systems has
emerged as one of the most important aspects of Power System reliability due to its
impact on system operation. Researchers have been working: - to identify the impact of
protection failures on a Power System; to incorporate protection failures into the overall
system analysis and to enhance the Power System reliability evaluation process.
Conventional protection was designed to disconnect the faulty or overloaded elements,
whilst leaving the rest of network in operation. Several recent system wide disturbances
indicated the local protection, which was designed to arrest a local system problem, is
limited in its ability to arrest a wide area disturbance. In addition, new protection
solutions are required due to the significant changes in the landscape of Power Systems
[8]. With an increasingly open energy market, Power Systems continue to expand,
integrating more renewable energy, distributed generation and independent power
producers. Regulatory pressure has ensured the Power System operators’ attention is
specifically focused on growing their return on asset investment, and this must be
achieved whilst energy consumption is continuously increasing and many of Power
System infrastructures are ageing. This means existing transmission networks are
expected to operate closer to their operating limits.
Automated protection schemes are designed to detect system abnormal conditions,
typically contingency-related, and then take predetermined corrective actions to
preserve system integrity and provide acceptable system performances. These schemes
are called System Integrity Protection Scheme (SIPS). SIPS are also known as
Operational Tripping Schemes (OTS), Special Protection Schemes (SPS) or Remedial
Action Schemes (RAS). It adds another dimension to the conventional Power System
protection, which has limitation in arresting wide area disturbance. SIPS are being used
by utilities to minimize the probability and consequence of large disturbances and to
cope with the growing size and complexity of modern Power Systems. In addition, with
the integration of renewables (e.g. wind generation) also generally means the sources
are remote from the load and the existing transmission lines are normally expected to
operate closer to their operating limits. SIPS offer a timely and economic solution to
Chapter 1: Introduction
Page | 25
enhance the capability of existing transmission network and postpone the need for new
transmission facilities. Remedial actions of SIPS include changes in load (e.g. load
shedding), generation or changes in system configuration to maintain system stability,
acceptable voltages or power flows.
Recent surveys have witnessed a significant proliferation of the SIPS installed to
enhance the transmission capability of the electrical network and accommodate more
renewable generations. The intense application of SIPS also leads to a massive
expansion of the information and communication technology (ICT) infrastructure. The
ICT used in a Power System allows the system operator to respond to the danger caused
by abnormal system conditions in a more effective and timely manner, in this case,
preventing the propagation of system disturbances. However, the increasing penetration
of advanced ICT brings significant changes in instrumentation, monitoring,
communication and control in the Power System protection area, which raises major
concern about its reliability.
Several system disturbances demonstrated that solutions increasingly reliant on SIPS to
preserve system integrity expose the system to additional risks. The system disturbance
report issued by the North American Electric Reliability Corporation (NERC) [9], refers
to 25 SIPS related maloperations over the period 2000 to 2009, the majority were
caused by hardware or software failure, faulty design logic or human error. Additionally,
a high penetration of SIPS increases the complexity of system operation. This may lead
to a higher probability of undesired interactions between the SIPS on the same or
neighbouring systems. The Irish incident on 5th August 2005 and the Nordic event on 1st
December 2005 highlighted the catastrophic impact of unintended SIPS interaction on
system reliability [10, 11].
SIPS maloperations normally lead to severe and costly consequences (e.g. customer
disconnection) due to their critical role in maintaining system integrity. Consequently, it
is vitally important to understand the failure mechanism of SIPS and ensure the
performance fulfil the strict reliability requirements. Most of the existing SIPS
reliability standards are deterministic and have limitations in quantitatively assessing
the risk induced by SIPS. Probabilistic techniques are therefore required to include the
impact of SIPS maloperations in the reliability assessment model.
Chapter 1: Introduction
Page | 26
Another potential risk on the reliability of the protection system comes from the
continuously ageing protection assets used in the current GB National Grid. A large
number of exiting protection assets were commissioned in the 1980s, which means they
are approaching their designed end of service life. Consequently, an investigation
process to accurately predict the reliable service lifetime for these devices is vitally
important.
As the owner of transmission network in England and Wales and the transmission
network operator across Great Britain, the National Grid has the obligation to ensure the
reliable delivery of electrical power without excessive cost. Motivated by these potential
risks affecting the reliability of the transmission network, the aim of this research is to
provide an insight into the reliability of the protection schemes and assets now used on
the GB transmission network, and develop investigation methods to quantitatively
assess the risks brought to system operation.
Most of the risk assessment methods developed in the previous literatures focused on
using analytical reliability assessment method to determine the optimal SIPS
operational logic and arming strategies. However, Power System operational conditions
will become more unpredictable due to the intermittent nature of renewables and
demand-side participation. A more dynamic stochastic based SIPS risk assessment
method is required to accurately assess the impact of fast changing system conditions on
the SIPS operational risks. With the increasing penetration of advanced ICT, changes in
the instrumentation, monitoring, communication and control in the protection system
need to be considered in the reliability assessment. Additionally, a widespread
proliferation of SIPS is expected during the next decade, partly following the greater
use of renewable generation. The current SIPS reliability assessment methods mainly
focused on assessing the reliability of single SIPS and had limited application in
assessing the reliability of a system with multiple SIPS. Therefore, a method that can be
used to effectively incorporate SIPS interactions and their impact on system operation
needs to be developed.
The main objectives of this research are as follows:
To review major SIPS maloperations and investigate their failure modes and impact
on the propagation of system disturbances.
Chapter 1: Introduction
Page | 27
To undertake a literature review on the lifetime analysis of the protection assets.
Furthermore, to determine deterioration mechanisms and identify the life-limiting
elements within the electronic protection assets currently used in the UK National
Grid transmission network.
To investigate the impact of component failures, communication architectures and
maintenance strategies on the communication services used in a Substation
Automation System (SAS).
To develop investigation methods to quantitatively assess the risks brought by SIPS
on Power System operation.
To evaluate the impact of increased deployment of renewable energy on the
operational performance and long-term development of SIPS applications, and
effectively manage the additional operational risk caused by SIPS proliferation.
1.3. Contributions
Driven by the perception that today’s Power System is subjected to the additional risks
brought by protection maloperations, it is necessary to incorporate the impact of system
protection in the reliability assessment models and quantitatively assess the risk brought
to system operation. This project advances existing techniques in reliability assessment
of wide-area protection schemes in electric Power System. In addition, investigation
processes are developed to address risks induced by both modern ICT and the ageing
protection infrastructure commonly used in Power Systems. The main contributions of
this project can be summarized as follows:
Identification of the impact of modern ICT on Power System Protection
An overview of the application of the advanced ICT in Power System protection is
reviewed. Changes brought to the instrumentation, monitoring, communication and
control systems in SIPS are demonstrated. In particular, a method to evaluate the
reliability of the communication services used in an IEC 61850 based digital
substation automation system is developed.
Review of National Grid Protection Performance Information (PPI) Reports for
the examination of the historical records of the main protection types in UK
transmission network
Chapter 1: Introduction
Page | 28
The operational performance for three most commonly used electronic relay types in
the UK National Grid transmission network are reviewed to identify any recorded
relay maloperations. In addition, the population, age profile, maloperation causes,
failure and repair history, and reports of benchmark experience from other utilities
are studied. In particular, the National Grid PPI reports from 2000-2013 are reviewed
to identify and study relay maloperations attributed to hardware failures.
Development of an end-of-life investigation process for protection asset life
extension evaluation
An investigation process based on functional testing and invasive examination is
proposed to determine the deterioration mechanisms of electronic protection devices.
This evaluation process has been successfully applied to the three most commonly
used electronic relays in National Grid transmission network (i.e. SHNB, THR and
LFCB) and has effectively validated asset life extension decisions of five years for
each relay type.
Proposal of risk-based reliability assessment methodologies for System Integrity
Protection Schemes
The failure modes of the main SIPS components and their impact on overall SIPS
operation are determined using Failure Mode and Effect Analysis. The risks induced
during SIPS operation come from three main sources: successful SIPS operation,
dependability-based maloperation and security-based maloperation. Two different
approaches are proposed to quantify the risk of each SIPS operational state: An
analytical method based on Markov Modelling and Reliability Block Diagram and a
stochastic method based on Sequential Monte Carlo Simulation. The two methods
are then applied to the IEEE 24 bus Reliability Test System with GRS logic.
Assessment of risk caused by unintended SIPS maloperations
As SIPS are normally perceived as a cost-effective alternative to transmission
network expansion to enhance system capability, a widespread proliferation of SIPS
is expected during the next decade, partly following the greater use of renewable
generation. High penetration of SIPS significantly increases the complexity of
system operation and also leads to a higher probability of unintended interactions
Chapter 1: Introduction
Page | 29
between SIPS on the same or neighbouring systems. A method based on multi-level
Markov Modelling and Sequential Monte Carlo Simulation is proposed to effectively
incorporate all the possible SIPS states. This method helps estimate the probability of
SIPS interactions and their impact on system reliability. It indicates that unintended
interactions between SIPS could result in cascading failures and lead to a more
severe impact, as compared with individual SIPS failure.
Management of the risk associated with SIPS in Power System long-term planning
The operating risk of SIPS, especially risks caused by SIPS interaction, would
increase significantly with greater wind integration and a higher penetration of SIPS.
The impact of long-term system planning on SIPS risk is considered by incorporating
transmission upgrading, demand increase and wind integration in the risk assessment
model. Different approaches are proposed to manage the continuously increasing
SIPS operational risk. The impact of transmission expansion on alleviating system
congestion as well as SIPS risks is studied. In addition, a new type of SIPS with
adaptive protection logic, which adjusts to the increasingly variable system
conditions, is designed to manage its operational risk and achieve better cooperation
with other protection schemes.
1.4. Outline of the Thesis
The thesis consists of eight chapters and is organized as follows:
Chapter 1 discusses the motivation, objectives and contributions of this research.
Chapter 2 provides an introduction of the application, classification, design
considerations and reliability requirements of a System Integrity Protection Scheme.
This is followed by a review of major system disturbances caused by SIPS
maloperations. The aim is to identify the main causes of SIPS maloperation and
investigate its impact on system operation and reliability. In addition, the sensing,
communication and control technologies used in SIPS, and the associated scheme
topologies and reliability enhancement methods are illustrated by reviewing existing
SIPS applications.
Chapter 1: Introduction
Page | 30
Chapter 3 investigates the impact brought by advanced ICT on Power System
protection. An overview of the SIPS communication infrastructure, together with its
reliability considerations is first discussed. In particular, the impact of the IEC 61850
communication protocol on the monitoring, communication, control and protection in
the substation automation system is investigated. This is followed by the reliability
assessment of the different communication services used in IEC 61850 based digital
substations.
Chapter 4 focuses on reliability assessment of electronic based protection equipment in
UK National Grid network. The protection assets’ population, ages, current placement
plan and historical performance record are first reviewed. An investigation process is
next developed to validate or forecast the reliable service life of a particular protection
type. The operational behaviours of the studied relays are compared with modern
numerical relays to check whether these contemporary replacement relay types could
offer a meaningful performance improvement that could influence the replacement
decision. Potential life-limiting conditions are identified by examining the operational
conditions of components on energized relay modules. Finally, management and
maintenance strategies are recommended to National Grid to ensure reliable service of
these protection assets.
Chapter 5 demonstrates using an analytical method to quantitatively assess the risk
brought by SIPS. Literature reviews on previous developed methodologies to assess
SIPS reliability are provided. The developed evaluation procedures include estimating
the probability of SIPS maloperation, evaluating the consequences of different
operational states on system reliability and quantifying the SIPS operational risk in
financial losses. Sensitivity analysis is used to identify the impact of uncertainties on the
SIPS operational risks.
Chapter 6 introduces a modified SIPS risk assessment procedure based on a stochastic
Sequential Monte Carlo Simulation to effectively reflect the time-series changes in
system condition. The future trend in energy scenarios and generation mix in the UK
Power System is estimated. An auto-regressive and moving averages model is
developed to forecast future wind speed based on historical data and is mapped into the
Sequential Monte Carlo Simulation process to reflect the intermittent nature of wind
generation. The method is then applied on the IEEE 24-bus Reliability Test System with
Chapter 1: Introduction
Page | 31
significant wind integration to check its impact on the operational risk of a Generator
Rejection Scheme (GRS).
Chapter 7 considers SIPS within the context of Power System long-term planning. The
widespread proliferation of SIPS introduces additional risks to SIPS operation due to a
higher probability of unintended SIPS interaction. The challenges in SIPS coordination
following increased operational complexity and SIPS penetration are first discussed.
Next, a risk assessment procedure, which was based on the method in Chapter 6, is
proposed to quantitatively assess the risk from possible SIPS interaction scenarios.
Finally, reliability enhancement methods such as transmission expansion and the use of
‘adaptive’ SIPS are introduced to manage the risk of SIPS implementation.
Chapter 8 presents the main conclusions from this research. The findings and
contributions are summarized. Finally, suggestions for future work on SIPS reliability
assessment are discussed.
Page | 32
CHAPTER 2
RELIABILITY OF SYSTEM INTEGRITY
PROTECTION SCHEME
2.1. Introduction of System Integrity Protection Scheme
A reliable electrical power supply is an important requirement for all consumers and
especially these working and living in advanced urban economies. Power Systems are
the most critical infrastructures built by man. Therefore, the primary emphasis has
always been to provide uninterrupted, high quality supply to residential, commercial
and industrial customers. Spare and redundant facilities for both the generation and
transportation of electrical energy have been built to ensure the continuity of supply in
case of equipment and human failures and especially transmission line outages.
Nowadays, system-wide disturbances are becoming a more important issue in the Power
System, as illustrated by several recent blackouts [12, 13]. During a major system
disturbance, protection and control systems are required to limit or stop system
degradation, restore the system to a normal state and minimize the impact of the
disturbance. However, traditional protection systems arrest local system problems or
protect a single item of plant. Therefore, such systems have limited communication with
other parts of the system and are not intended to arrest wide area disturbances. The
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 33
impact of system-wide disturbances and the prevention of blackouts require the
protection system to be integrated with modern technologies and designed to preserve
system integrity under severe system conditions. In addition, significant changes in the
landscape of Power Systems, and especially those including significant renewable and
intermittent resources also require the use of new protection solutions. For example, in
the past, electrical power was often generated by coal stations located close to the
demand centres. Whilst in the future, coal stations will be decommissioned and power
will be generated by wind and nuclear, shifting the location of generation away from
load. Meanwhile, the trend in Power System planning is leading to a system with tight
operational margins and less transmission redundancy. Hence a transmission network
becomes more essential and may be expected to operate frequently close to their
operating limits.
All these fundamental changes in the design, operation and planning of the electric
Power System encourage the use of a system-wide protection solution, integrated into
the Power System and designed to minimize the probability of large disturbances and
cope with the growing size and complexity of modern Power Systems. Consequently,
automated protection schemes are designed to detect system abnormal conditions,
typically contingency-related, and then take predetermined corrective actions to
preserve system integrity and provide acceptable system performances. These schemes
are called System Integrity Protection Scheme (SIPS). SIPS are also known as
Operational Tripping Schemes (OTS), Special Protection Schemes (SPS) or Remedial
Action Schemes (RAS). They are commonly used by utilities as a timely and economic
solution to enhance system security and postpone the need for new transmission
facilities. Remedial actions of SIPS include changes in load (e.g. load shedding),
generation or changes in system configuration to maintain system stability, acceptable
voltages or power flows.
A general SIPS operation consists of three main steps: input, decision making and
control application. Examples for each SIPS operational phase are listed in Figure 2-1.
The input of SIPS is normally the electric variables measured at various locations or the
direct detection of an event such as the open/closed status of circuit breakers, etc. Inputs
from the power system are then sent to the local logic processor or control centre, where
stability analysis is performed and a decision made on whether SIPS operation is
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 34
required and how this will maintain system security. Once a decision to take action is
made, the control command will be communicated to the mitigation devices in the field
to execute the corrective action.
Disturbance
Power System
Input
System Integrity Protection Scheme
Decision Making Control Application
Power FlowVoltageROCOFGen&load Monitoring
Arming CalculationLogic ControlMitigation level Calculation
Gen TrippingLoad Shedding
e.g. e.g. e.g.
Figure 2-1: General Structure of System Integrity Protection Scheme
2.1.1. SIPS Applications
SIPS aim to trigger corrective actions when detecting abnormal system conditions.
These actions help maintain the integrity of the Power System against the following
issues:
System congestion
Small-disturbance angle instability
Transient instability
Frequency instability
Voltage instability
Thermal overloading
Etc.
SIPS are designed to mitigate these critical contingencies which may then initiate wide
area system problems. Various remedial actions can be applied to improve system
performance. The selection of the control action is based on the power system topology
and its tolerance to the risks brought by the control actions. For example, generator
tripping is an effective method to balance the generation with load and to preserve the
transient, frequency and voltage stability. However, it may also reduce the system
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 35
inertia, damage the generator drive shaft and cause thermal stress within the generator
which may be unacceptable for some systems conditions.
In general, different control strategies are applied to address various disturbance
propagations. The categories and percentages for each type of SIPS action is tabulated
according to the most recent survey on SIPS conducted by IEEE-CIGRE in 2010 [14]. It
can be seen that the load rejection (10%) and under frequency load shedding (UFLS)
were the most commonly used SIPS at that time. Load rejection is a protection scheme
designed to operate following a system event which causes supply-load unbalances that
may eventually lead to wide area system disturbance. It ensures a system or subsystem
in parallel with the remaining parts of the system in case of loss of major supply. The
load rejection SIPS differs from the automated under-frequency load shedding. It is
designed to separate the system before the change of frequency can trigger the operation
of the under-frequency relays. With the rapid growth in wind generation, the generator
rejection schemes (GRS) are becoming more frequently used to help alleviate system
congestion and allow greater access to lower-cost power.
Table 2-1: SIPS Categories by Type of Corrective Actions
Load Shedding Generation Control
- Slow speed System Stability
Load rejection (10%) Generator rejection (8%) Out-of-step tripping (7%)
Under frequency load
shedding (8%)
Power System stabilizer control
(3%)
Voltage instability advance
warning (2%)
Under voltage load
shedding (6%)
Discrete excitation (1%) Angular stability advance
warning (1%)
Adaptive load mitigation
(2%)
Generator runback (3%) System separation (7%)
Overload mitigation (7%) AGC actions (4%) Dynamic braking (1%)
Controls - Slow speed Controls – high speed reactive
voltage compensation Congestion Mitigation
Tap-changer control (2%) Bypassing series capacitor (2%) Congestion mitigation (3%)
Turbine valve control (1%) Shunt capacitor switching (5%) Load and generation
balancing (3%)
Black-start or gas turbine
start-up (1%)
SVC/STATCOM control (4%) Busbar splitting (2%)
HVDC controls (3%)
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 36
2.1.2. SIPS Classification
SIPS are installed to preserve stability or integrity of the overall Power System or its
strategic portions. Therefore, the application of SIPS may require multiple monitoring
and implementation devices allocated all over the system and the utilization of
communication facilities. SIPS can be classified by many factors, including architecture,
input variables and operating times, etc.
1) SIPS Architectures
SIPS can be classified in terms of their architectures. For example, based on the
physical location of the sensing, decision making, and control devices and the impact of
the scheme on the Power System, SIPS can be classified into the following categories:
a) Flat Architecture: For this type of SIPS, all the measurement, decision-making and
control devices of the SIPS are typically located in one location. The decision
making and the initiation of the corrective action may also require remote
information collected by the communication facility. Operation of this type of SIPS
normally has dedicated function and only affects a portion of the system. An
example of flat architecture is the under frequency load shedding (UFLS) scheme.
The UFLS relays are normally distributed at different locations in the network and
operate to trip preselected circuit breakers to disconnect small sections of distributed
network and their associated loads when frequency drops below pre-set values.
b) Hierarchical Architecture: SIPS with hierarchical architecture involve multiple
steps in its control actions. This type of SIPS requires communication between
substations to transfer the local measurements and predetermined parameters to
multiple control locations and is able to conduct its decision based on a system-wide
view. Operating logics involves the use of operating nomograms. State estimation
and contingency analysis can also be integrated into the decision making process.
Consequently, the coordination between different protection actions throughout a
wide area network can be achieved via multi-level corrective actions. For example, a
system separation scheme is normally a hierarchical architecture which involves
monitoring of multiple interconnecting circuits, sending trip signals to circuits at
different locations and altering the power flow on other interconnectors.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 37
The main difference between the two SIPS architectures is the necessity for the control
coordination to take a higher and wider system view to implement the protection actions.
A flat architecture involves a single layer of decisions and actions whereas the
hierarchical scheme may involve multilayers of decision making and control actions and
requires communication between substations. The application of the two architectures is
also dependent on the system condition and the required protection speed. Immediate
control actions initiated by the local schemes are sufficient in some small systems.
However, in a large and highly interconnected system, control coordination with state
estimation operating over a wide-area system may be required to prevent the
propagation of the disturbance.
2) Centralised and Distributed Schemes
Another classification is to separate SIPS into centralised SIPS and distributed SIPS
based on the location of its controller and corrective devices. In a distributed scheme,
the controllers installed at different locations in a system are used to implement the
corrective actions. The distributed intelligent electronic devices (IED) can process the
local information based on local requirements. The system protection function can be
realized by integrating and coordinating the distributed controllers which provides the
corrective actions. For a centralised scheme, the wide area monitoring system (WAMS)
gathers all the information required from local and remote station to one location, where
the decision-making process is implemented. The centralisation of the distributed
information can be realized as part of the energy management system (EMS), using a
centralised programmable logic controller (PLC) or remote terminal unit (RTU).
3) Input variables
According to the input variables and decision making process as described in [15], SIPS
can also be classified as follows:
a) Event-based: In an event-based scheme, electrical outages are directly detected and
initiate the corresponding emergency action such as generation rejection and load
tripping.
b) Parameter-based: Parameter-based schemes are initiated by significant changes in
the measured variables.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 38
c) Response-based: In response-based schemes, system response during emergencies
is monitored and a close loop is incorporated in the decision making process to
determine the best response to the system situation.
d) Combination of the above: In practice, most of the schemes are combinations of
above types of schemes. For example, some schemes are triggered by a combination
of events and parameters.
2.2. SIPS Design Consideration
As shown in Figure 2-2, SIPS design process can be broken down into five steps [16]:
1) System Study: Accurate system study needs to be completed to identify all the
contingency scenarios and determine the parameters required by the control centre
from the monitoring system of SIPS. In particular, the thermal, voltage or regular
instability related system limitations or restrictions under various system
contingencies are evaluated. The arming criteria and reliability levels also need to be
determined.
2) Solution Development: The minimum actions required for each type of system
contingency is determined based on a system study. The corrective actions for
different SIPS applications can be found in Table 2-1. For example, the amount of
load shedding, generator rejection, stability limits and voltage limits for the different
SIPS implementation is determined in this stage.
3) Design and Implementation: In the implementation stage, practical issues needs to
be addressed as listed in [16]. Questions regarding the technology/functional
requirement, cost effectiveness, maintenance plan, complexity, redundancy, logic
development and physical architecture of the implementation need to be discussed.
4) Commissioning & Periodic Testing: Successful implementation solution of SIPS
relies on a proper testing plan which should include lab testing, field testing, study
validation and periodic testing.
5) Training & Documentation: SIPS failures caused by faulty logic design and human
errors occupy 42% of total failures based on a previous survey [17]. Proper training
of operating and maintenance staff helps reduce human errors and ensure reliable
operation. Complete documentation about SIPS functionalities and their operational
record will improve the efficiency in staff training.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 39
System StudySolution
DevelopmentDesign &
Implementation
Commissioning & Periodic
Testing
Training & Documentation
SIPS Design Process
Figure 2-2: SIPS Design Process
The operation requirement of SIPS is derived from system planning studies which
identifies the performance criteria following system contingencies. Among the most
important features identified from the system study are [18]:
2.2.1. Initiating Conditions
The critical system contingencies to initiate SIPS operation if the scheme is armed are
identified as SIPS initiating conditions. These may require local or wide-area devices to
measure the following parameters:
Voltages and/or currents
Frequency
Control signals, e.g. automatic voltage regulator, Power System stabilizer,
generator governors, reactive power compensation including HVDC converters and
FACTS, etc.
Status including circuit breaker position, tap-changer position, and disturbance
recorder start signals, etc.
Arming: levels, thresholds, automatic/dynamic, and manual.
The arming criteria determine the system conditions for which SIPS are switched into
the standby mode and are ready to take control actions. SIPS are normally designed to
monitor the load level, generation level, voltage, frequency, breaker status and other
quantities which help identify the emergence of Power System problems. The
information required at different locations can be collected using SCADA and EMS
computer and then processed by programmable logic controllers, microprocessor-based
relays and other IEDs. The arming process can be done either automatically or manually.
2.2.2. Time Requirements
The maximum allowable time for the remedial action to be accomplished need to be
determined. Stability problems typically have the fastest “action” requirements, they
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 40
maybe as fast as a few cycles, but usually require operation in less than one second.
Voltage collapse problems may allow a response to be delayed for several seconds,
whilst actions to mitigate thermal overloading could occur after several minutes.
SIPS may exist in a stand-alone mode to provide fast actions using local data, or it may
use system wide data for decision making. The latter may require longer operating times
due to the communication of data between the measuring devices and the control centre.
Examples of IED based SIPS include detecting changes in system topology and
detecting loss of synchronism. EMS based SIPS take a more ‘static’ view of the Power
System, and generally use the communication interface of the SCADA/EMS function.
Actions such as optimal power flow, emergency load control can be used by this type of
SIPS.
Communication Requirements
Loca
lC
en
tral
ise
d
Milliseconds Seconds Minutes
IED System Integrity Protection Scheme
Wide-area System Integrity Protection
Scheme
Energy management System Integrity
Protection Scheme
Figure 2-3: System Integrity Protection Scheme Typical Operating Times
2.2.3. Redundancy Consideration
Similar with conventional protection, redundancy in SIPS design has to be considered.
This ensures the removal of one scheme component following a failure, or perhaps
maintenance, will not affect the normal operation of the scheme. Redundancy
requirements cover each aspect of a SIPS design including: detecting, arming, power
supply, communication IEDs and logic controllers. Redundant components need to be
provided in SIPS design. In addition, since the communication system is the backbone
of a SIPS application, the reliability of the overall communication path becomes critical
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 41
in SIPS operation. Therefore, a normal SIPS operation, after losing a single
communication path, needs to be ensured.
The introduction of redundancy in a SIPS system will lead to an increased probability of
unwanted SIPS operation (i.e. SBM). Similar to failure to operate, undesired SIPS
operation will also have an adverse impact on the Power System. Therefore, a voting or
vetoing scheme could be designed to balance the trade-off between SIPS dependability
and security.
a) Voting: the logic solver, upon receiving multiple commands from duplicated
detection systems, is programmed to perform a voting provision. That is, if one of
the systems detects a line-outage, the logic solver will make the trip decision to
initiate the event-based GRS.
b) Vetoing: vetoing logic compares the output signals from multiple systems. The
logic solver needs to validate the decision between the redundant systems prior to
issuing any trip decision. If the output of each system is different from each other,
the system will veto the trip decision, enhancing system security. Therefore,
incorrect SIPS operation due to misinterpretation of inputs or data will be mitigated.
2.3. SIPS: Industry Experience
2.3.1. SIPS Applications and Maloperations
Surveys to investigate SIPS in existence worldwide were conducted by IEEE and
CIGRE in 1989, 1996 and 2009 respectively [14, 17, 19]. The results show a significant
growth in the number of schemes as indicated in Table 2-2. It is apparent that SIPS are
now widely used by electrical utilities as a solution to defend against large disturbances.
Consequently, reliable operation of SIPS needs to be ensured, as failure to achieve
adequate reliability exposes the Power System to additional risks, especially those
resulting from SIPS maloperations.
Table 2-2: SIPS Survey Results
1989 Survey 1996 Survey 2009 Survey
Respondents 18 49 110
Schemes 93 111 958
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 42
When operating as designed, SIPS can effectively prevent system degradation during
contingencies. However, due to its critical role in preserving system integrity,
misoperations of SIPS normally lead to severe and costly consequences and this raised
concerns when SIPS was initially implemented. Estimated costs of both operational
failure and unnecessary operation of SIPS were questioned in the 1996 survey. It
indicated that the cost of SIPS failure can be very high, since most of these responding
selected the highest cost category, which is above 500,000 USD. Meanwhile,
unnecessary SIPS operation will also incur a lower cost as compared with failure to
operate, with a penalty between 10,000 to 100,000 USD. Therefore, consideration in the
assessment of SIPS performance should be given in terms of both dependability and
security. According to PRC-004-WECC-1 [18], failure of SIPS can be classified into
two ways:
a) Dependability-based Maloperation (DBM): Dependability-based maloperation is
the absence of a protection system or RAS operation when required. Dependability
is a component of reliability and is the measure of device certainty to operate when
required.
b) Security-based Maloperation (SBM): Security-based maloperation refers to a
misoperation caused by the incorrect operation of a protection system or RAS.
Security is a component of reliability and is the measure of a device’s certainty not
to operate falsely.
The system disturbance reports published by the North American Electric Reliability
Corporation (NERC) were reviewed to identify the root cause of SIPS failures. NERC
has published its findings on system disturbances, demand reductions and unusual
occurrences in the bulk Power Systems in North America since 1979. With a mission to
assure Power System reliability and security, NERC’s area of response covers the
continental United States, Canada, and the northern portion of Baja California, Mexico.
From 1986 to 1995, 24 system disturbances have involved operation of SIPS [20].
Among them, 16 cases were reported as successful operation, while 8 involved
operation failures. The probability of SIPS operation failure was extremely high.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 43
Table 2-3: SIPS Failures Recorded by NERC from 1986 to 1995
Events SIPS Type Main Cause Consequence Date
WSCC-Northeast/Southeast
Separation Scheme
System
Separation Faulty design
1902 MW generation lost
and 253 MW load
interruption
04/04/1988
NPCC-Hydro- Québec Load Rejection Hardware failure System-wide blackout 18/04/1988
NPCC-Hydro- Québec Load Rejection Hardware failure 3950 MW load
interruption 15/11/1988
British Columbia Hydro /
TransAlta Separation
Controlled
opening of lines Arming failure Cascade line outage 07/01/1990
Garrison-Taft 500kV No.1&2
outages
Var
Compensation
Faulty logic
design
119 MW Generator
tripping/25 MW load
interruption
08/01/1990
SE Idaho/SW Wyoming
Outage
Generator
Rejection Hardware failure Cascade line outage 09/12/1991
Pacific AC Intertie
Separation
System
Separation Software failure
Fail to separate system
however no server
consequences
17/11/1991
Minnesota - Wisconsin
Interface 69 kV conductor
burn down
Controlled
opening of lines Wrong settings Two 69kV lines burned 13/10/1992
25 SIPS maloperations were reported during the period from 2000 to 2009 [9], which
was more than 3 times the number of SIPS maloperations from 1986 to 1995. Number
of SIPS has grown significantly hence result is as expected. Among them, 18 cases were
identified as consequence of unnecessary SIPS operation (i.e. SBM), taking up 72% of
the total operational failures, while 7 cases were caused by dependability based
misoperations. A SIPS failure to operate can be caused by hardware failure, software
failure, faulty design logic and human error. Figure 2-4 shows the causes of SIPS
misoperations. Among the recorded SIPS maloperations, hardware failures are the most
common causes, 36% of the total failures. This is normally resulted from physical stress
on the installed components, while the software failure occurrences are caused by
vendor/user written embedded errors, application and utility software. Faulty design
logic may occur as a result of an inappropriate or incomplete system study during SIPS
design. Human errors can be classified based on whether they are associated with
construction, operation, or maintenance.
The SIPS historical maloperation record indicates that majority SIPS maloperations are
SBM. This is due to the protection system was originally designed with a bias on
dependability. Consequently, the SIPS design considerations and operational logics are
further discussed in this paper to effectively balance the trade-off between SIPS
dependability and security. Currently, component hardware failures are the most
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 44
common cause of SIPS failures. However, in the future, with the application of more
ICT and IEDs in SIPS application, software is more likely to become the main issues
leading to SIPS failure. A detailed study of SIPS communication architectures will be
provided in the next chapter considering the penetration of modern ICTs.
Figure 2-4: SIPS Maloperations and Causes from 2000 to 2009 NERC Reports
2.3.2. SIPS Reliability Criteria
The 1996 survey indicated most of the reliability criteria for the SIPS designs were
qualitative rather than quantitative. Moreover, some respondents did not use any
reliability criterion to assess the performance of the SIPS. This situation has
significantly changed following the global proliferation of SIPS as well as the increase
in SIPS maloperations. Currently, institutes such as North American Electric Reliability
Corporation (NERC) have developed multiple SIPS reliability standards and assessment
procedures. This includes the description of the system studies that need to be carried
out prior to initial installation and commissioning, the periodic assessment procedures
and the historical SIPS performance data base, etc. A few of the standards are reviewed
[21]:
PRC-004-WECC-1 Protection System and Remedial Action Scheme (RAS)
Misoperation: This is a regional reliability standard, designed to ensure all the
generation and transmission protection systems and transmission related SIPS
maloperations are analysed and mitigated. The following requirements need to be
applied to the Western Electricity Coordinating Council (WECC) RAS: 1) All the RAS
operations and tripped transmission elements need to be reviewed within 24 hours to
analyse the correctness of the operation. 2) If a RAS has a security-based maloperation,
it needs to be removed from the system within 22 hours; for RAS with either DBM or
SBM, it is required to be replaced with a functionally equivalent protection system
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 45
(FEPS) within 20 business days. 3) The transmission owners are required to submit
RAS maloperation incident reports to WECC within 10 business days to identify the
main causes of the incident and assist repairs and replacement of the maloperated RAS.
PRC-015-0 Special Protection System Data and Documentation: This standard is to
ensure the proper design and coordination of all the SIPS. It also specifies that all the
maintenance and testing procedures and ensures the maloperations are analysed and
corrected. A database needs to be created and maintained for each RAS installed,
including the flowing information: 1) Contingencies and system conditions for which
RAS is required to operate. 2) The remedial actions taken by the RAS in response to
system contingency. 3) The detection logics and relay settings of the RAS.
Information Required to Assess the Reliability of a RAS Guideline: This document
provides a framework for the Remedial Action Scheme Reliability Subcommittee
(RASRS) to evaluate SIPS. It describes the procedure for periodic SIPS assessment and
the information required for reliability assessment. A RAS review is required prior to
initial installation and commissioning, before significant modifications or extensions,
after failure operation and before removal from service. The periodic assessment needs
to be performed at least every five years for compliance with NERC and WECC
standards.
There are also some international standards which can be quantitatively enforced and
applied to SIPS reliability assessment. The International Society of Automation (ISA)
and the International Electro-technical Commission (IEC) define “Safety Integrity Level
(SIL) as a relative level of risk-reduction provided by a safety function, or it can be used
to specify a target level for risk reduction” [22]. SIL can be expressed as a probability of
failure on demand Pr(DBM) or as risk reduction factors (RRF). Table 2-4 describes the
four SIL levels in terms of Pr(DBM) and RRF, with SIL-4 being the level with the
highest reliability and SIL-1 being the lowest.
Table 2-4: ISA and IEC Defined Safety Integrity Level (SIL)
SIL Availability P(DBM) RRF
4 >99.99% 1E-05 to 1E-04 10,000 to 100,000
3 99.90-99.99% 1E-04 to 1E-03 1,000 to 10,000
2 99.00-99.90% 1E-03 to 1E-02 100 to 1,000
1 90.00-99.00% 1E-02 to 1E-01 10 to 100
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 46
Whilst safety integrity level (SIL) is used to evaluate system dependability, spurious trip
level (STL) complements SIL by defining the probability of unscheduled spurious trips
of the system. Table 2-5 shows the range of STL levels, expressed as probability of
spurious operation P(SBM) and spurious trip reduction (STR) values [23]. The higher
the STL level, the lower the probability spurious trips will occur in the system. To
improve SIPS operational performance, in terms of both SIL and STL, can be a
complex process, since any increase in the SIL level may result in a decrease in STL. In
practice, any increase in system redundancy may result in better performance in terms
of system dependability, but worse performance in terms of system security. Therefore,
the SIPS reliability enhancement method needs to be carefully designed to effectively
balance the trade-offs between security and dependability.
Table 2-5: Spurious Trip Level (STL) in terms of P(SBM) and STR
STL P(SBM) STR
x 1E-(X+1) to 1E-X 10X to 10X+1
--- --- ---
4 1E-05 to 1E-04 10,000 to 100,000
3 1E-04 to 1E-03 1,000 to 10,000
2 1E-03 to 1E-02 100 to 1,000
1 1E-02 to 1E-01 10 to 100
2.4. Existing SIPS Applications
This section provides enhanced understanding of the deployment, design and operation
of SIPS by reviewing some existing SIPS applications. The technologies applied in the
sensing, communication and control technologies used in SIPS, and the associated
scheme topologies and reliability enhancement strategies are illustrated.
2.4.1. Dinorwig Intertrip Scheme
The Dinorwig Intertrip scheme as deployed at the Dinorwig pumped hydro station in
North Wales, is designed to preserve the stability of the North Wales supergrid area.
Commissioned in 1984, Dinorwig station is composed of six 330 MVA generators, also
capable of operating as six 312 MVA motors for pumping purpose [24]. The original
purpose of the hydro station was to provide storage capacity, for the excess power
generated by the nearby nuclear stations at times of low demand. In the early 1980’s,
Britain had an excess of base-load nuclear during summer nights. Nowadays, Dinorwig
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 47
is operated as short-term operating reserve (STOR) and provides fast response to rapid
changes in power demand (e.g. sudden load pickup) or sudden loss of power stations.
Figure 2-5: One Line Diagram of North Wales Supergrid [25]
The Trawsfynydd-Deeside and Trawsfynydd-Legacy circuits are among the most
critical circuits in the North Wales Supergrid. An outage of either circuit followed by a
fault resulting in the loss of both Deeside-Pentir circuits would leave only one
operational circuit from the North Wales power stations to the rest of GB the system.
This could cause instability at both Dinorwig and the nearby nuclear stations and result
in a high probability of circuit overloading. These system emergencies can be
effectively alleviated by tripping a certain amount of generators or motors at Dinorwig.
Two power-measuring relays, monitoring the power absorbed (pumping) and the power
generated by Dinorwig, are deployed by the intertrip scheme. This ensures the machines
at Dinorwig will not be tripped unless the power generated or absorbed by Dinorwig is
higher than a certain level, which may lead to overloading or instability. The status of
the Deeside-Pentir 1 & 2 circuits is determined by monitoring the associated circuit
breakers, and the line and busbar disconnectors. In addition, activation signals can also
be received from the main protection relays of the two circuits. If both circuits are
inoperative simultaneously, an intertrip signal is transmitted to Dinorwig to initiate the
scheme. When a tripping signal is received by the Dinorwig intertrip scheme, two
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 48
machines will be tripped to ensure the remaining power transfer to and from Dinorwig
does not exceed 1250 MW, preserving the stability of the area.
Existing line outage detection methods, as deployed by Generator Rejection Schemes
are now reviewed. Fast detection of a line outage is considered an effective way to
initiate a system integrity protection scheme designed to prevent a Power System
collapsing during severe events. According to the survey conducted in 2010, detection
of line outages can be taken in three main forms, depending on the different levels of
security required by the scheme.
a) Monitoring breaker auxiliary contacts: This is a relatively simple mechanism,
however, could be insecure from two perspectives. First of all, the switch
mechanism of the breaker auxiliary contacts can fail, especially during breaker
routine testing. Secondly, spurious breaker open signals can be unintentionally
generated during transients caused by other control signals. For example, coupling
of the breaker auxiliary contact wiring from other control signals in a cable way can
lead to transients that appear to look like breaker open signals. The transient can be
detected by input-circuit debounce. However, it may also cause significant delay to
the SIPS scheme in detecting breaker open signals.
b) Monitoring breaker status AND “undercurrent” signal: A more secure
mechanism can be implemented using a combination of breaker auxiliary contact
status and current measurements on the line. The zero-current detecting decision
can be performed by most digital relays within half a cycle, resulting in more
secure line outage detection [14].
c) Monitoring protective relay trip signals: This mechanism is used when the speed
of outage detection is paramount. Both the relay trip signals and the breaker failure
outputs need to be monitored by the scheme.
A combination of breaker status and undercurrent signal (i.e. method b) is used for line
outage detection. The line outage detection logic is shown in Figure 2-6. The decision is
made based on a combination of undercurrent (UC) detection on all three phases of line
and the associated breaker open condition. In addition, an appropriate time delay is
added to avoid a fictitious line outage caused by power system transients.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 49
Figure 2-6: Line Outage Detection Logic used in GRS
2.4.2. PacifiCorp’s Jim Bridger RAS
The Jim Bridger Power Plant is located 22.7 miles east of Rock Springs in southwestern
Wyoming and is equipped with four 550 MW generators [26] . The power plant is
connected to the eastern Idaho transmission system via three 345 kV lines. There are
three 345kV/230kV transformers at Jim Bridger and three 230 kV transmission lines
connecting to the Wyoming transmission system. Loss of any transmission line from
Jim Bridger to the Western transmission system (Idaho) will cause overloading and lead
to system instability. In addition, during the fault, the voltage at the generator terminal
will significantly drop and the generator will accelerate. These problems will continue
until the faulted transmission line is disconnected from the system. Once this occurs, the
impedance of the transmission path will increase. The combination of an increase in
path impedance and generator acceleration will cause oscillation between generator
rotors and the Power System, which may lead to a generator out of step condition and
voltage swings at the Jim Bridger substation. Without the RAS, or when the RAS is not
in service, the output of Jim Bridger is restricted to 60% of its capacity. The Jim Bridger
RAS is therefore required to maximize the power transfer on the existing transmission
network and protect against dynamic stability problems. The following control
functions are performed by the RAS [27]:
Generator tripping:
- Arming level calculation
- Generation tripping requirement calculation
- Selection of units to trip
Series capacitor bypass control at Burns 525 kV reactive station (capacitor provides
30% compensation on the Midpoint to Summer Lake 525 kV line)
Shunt capacitor bank insertion at Kinport 345 kV and Goshen 161 kV
Permission for line series capacitor insertion at the Jim Bridger 345 kV
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 50
- Permission from Jim Bridger for Lag segment (1/3 of the total installed series
compensation) insertion at each 345 kV capacitor.
- Activation of subsynchronous resonance (SSR) protection for the generating
units at the Jim Bridger Power Plant.
Figure 2-7: Geographic Overview of PacifiCorp’s Jim Bridger Transmission System [26]
Due to the critical role of Jim Bridger RAS in preserving system stability, redundant
“input, output and processing” units are required to enhance the dependability of the
scheme. However, under most system conditions, the tripping of a 530 MW unit is not
required especially if the load level is low and the fault only involves a single-line-to-
ground. Therefore, enhancing security against the false operation is critical in reducing
the operational costs. Consequently, a triple modular redundant (TMR) programmable
logic controller with two-out-of-three voting logic is deployed by the RAS as shown in
Figure 2-8. Within each RAS system, there are three identical systems gathering the
input/output (I/O) data. These perform two-out-of-three voting on the status and the
calculations. This process confirms if an action is required. A total loop time of less
than 17 milliseconds is provided by the TMR system.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 51
Figure 2-8: Jim Bridger RAS Triple Modular Redundant (TMR) System [26]
2.4.3. Southern California Edison Centralised RAS
Due to the various green initiatives and renewable portfolio standards (RPS) mandates,
generation interconnection requests to the power grid have escalated dramatically in
recent years. A proliferation of RAS/SIPS solutions are now expected within the
Southern California Edison (SCE)’s service territory to economically accommodate
more renewable generation. However, due to an increasing number of RAS being
installed, SCE is now facing the challenges brought by the standalone nature of existing
SIPS implementations. The limited communication capabilities of RAS, especially to
other part of the system, along with laborious maintenance and test practices,
significantly impede SCE’s ability to deploy the large number of new RAS required to
satisfy all the generation connection requests.
A breakthrough solution has been raised by SCE, that effectively centralises all the
existing standalone SIPS and the monitoring and protection functions to achieve better
RAS coordination and maintenance. The validation of the Centralised RAS (CRAS) is
enabled by advanced field intelligent electronic devices (IEDs), fast computing
controllers, SCE’s extensive wide area fibre communication networks and the
continuously developing communication standards.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 52
Figure 2-9: The Existing and Forecasted RASs in SCE’s Service Territory [28]
From the advanced Information Communication Technologies (ICT) point of view,
several breakthroughs have been achieved by the CRAS: An Intel-based modern
computer, which is capable of IEC 61850 communications, is used as central controller
instead of programmable logic controllers (PLCs). This provides faster computing
capabilities. SCE has more than 7000 miles of high-speed communication network,
which makes fast wide area communication possible. The IEC 61850-8-1 Generic
Object Oriented Substation Event (GOOSE) messaging [29] was chosen as the transport
mechanism and used for data transmission across the LAN and WAN in an IEC 61850
format. IEC Technical Report 61850-90-5 [30] provides details of a communication
protocol for event-driven GOOSE message, designed to extend its application from a
LAN to a WAN. The WAN comprises of dual-redundant T1 and Ethernet data
communication links. This is the first attempt to apply IEC 61850 over a large scale
wide area that involves monitoring and protection. The GOOSE message needs to be
encoded to ensure its security over WAN communication and reduce the vulnerability
related to cyber security. The Group Domain of Interpretation (RFC 6407 - GDOI) can
be used to provide symmetric keys to secure data signing and encryption [30].
The overall communication network layout of the centralised RAS is shown in Figure 2-
10. As required by the SIPS design guidelines, full redundancy of the CRAS is provided
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 53
by duplication of the control centre, the monitoring and mitigation relays and the
communication network. Each substation uses redundant relay sets, and implements
redundant and diversely routed telecommunication circuits to the control centres. The
Central Controllers of the CRAS are designed with triple modular redundancy (TMR)
and is installed at geographically separated locations (i.e. Grid Control Centre (GCC)
and Alternate Grid Control Centre (AGCC)). The controller at each site has an active
triple-redundancy controller, as well as a hot standby backup. Redundant Ethernet
communication links between the two controllers are provided to exchange the
information acquired from the substations. The CRAS also has interface with the SCE’s
Energy Management System (EMS) for data communication as well as model mapping.
Figure 2-10: SCE CRAS High-level Network Architecture [28]
The CRAS provides a platform for the system operators to migrate the “situation
awareness” monitoring to actionable grid control and protection strategies. Currently,
most system protection schemes are designed based on predetermined seasonal, off-line
and pre-planned mitigation strategies. With the CRAS platform, the on-line dynamic
and hierarchically layered control function can be developed to leverage the capability
of the protection schemes. This will be further discussed in the following chapters.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 54
2.5. Review of Major SIPS Maloperations
The following analysis of SIPS maloperations provides a better understanding of the
SIPS failure mechanism. It demonstrates the impact of SIPS maloperation and
unintended SIPS interactions on the propagation of system disturbance.
2.5.1. Irish System Disturbance, 5th August 2005
The Irish system disturbance caused a temporary loss of supply to 326,000 customers in
Ireland and a further disconnection of 74,000 customers in Northern Ireland. The
System Separation Detection Scheme and the Moyle Run-back Scheme are both
relevant to the development and spreading of the disturbance.
1) System Separation Detection Scheme:
The Ireland and Northern Ireland system are interconnected by the Louth-Tandragee
275 kV interconnector and two smaller 110 kV interconnectors between Letterkenny
and Strabane and between Corraclassy and Enniskillen. Due to the limited transmission
capacities of the two 110 kV interconnectors, the system separation scheme is designed
to trip these two interconnectors after detecting the loss of the main 275 kV circuit.
2) The Moyle Run-back Scheme
With the advent of Moyle direct current link between Northern Ireland and Scotland, the
system separation detection scheme is also used to command a change in flow on the
Moyle interconnector following a loss of the main Ireland-Northern Ireland
interconnection.
The run-back scheme is designed to alter the power flow on the Moyle interconnector. It
is used to prevent excess power in the Northern Ireland system resulting from loss of
interconnection to Ireland. The aim is to maximise the capacity for power flows on the
North-South Interconnector.
Before the incident, the All-Ireland Power System was operating normally with a total
demand of 3,302 MW, 377 MW of which was served by an import from Northern
Ireland. Additionally, the operating reserves were normal and the generating plant
availability was more than sufficient at 4,880 MW. However, several circuits were out
of service for maintenance, including one of the most critical lines, the Louth-Tandragee
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 55
No.1 275 kV circuit. This line is half of the interconnector to Northern Ireland and its
outage had a significant impact on the system disturbance.
Figure 2-11: The Ireland Transmission System Map [31]
At 10:22, an inter-trip signal was detected by the System Separation Detection Scheme
at Tandragee substation on the Louth-Tandragee No.1 275 kV circuit, which was out for
maintenance. The false detection of this signal was reported to be caused by radio
interference. The signal was sent to instruct the Moyle DC interconnection to Scotland
and triggered a run-back, which reversed an import of 115 MW to 168 MW export and
dropped the frequency to 49.52 Hz. In addition, the signal was also transferred to
Enniskillen and Strabane and tripped the two standby 110 kV interconnectors. At 10:24,
a second run-back was incorrectly triggered on the Moyle Interconnector, increasing the
power export to Scotland from 168 MW to 416 MW and the frequency dropped to
48.82 Hz, which is lower than the 48.85 Hz trip frequency of the 1st stage of the under-
frequency load-shedding scheme. This was caused by the two-minute timer in
Ballycronan More which triggered after it timed out, this occurred because it was still
monitoring the original inter-trip signal in Tandragee. Both interruptible and normal
tariff customers were automatically disconnected. In addition, two generator units (i.e.
Tarbert unit 4 and Moneypoint unit 1) were tripped due to the low frequency after the
second run-back. This led to further under frequency load shedding, 15.6% of the south
of Ireland system demand and 11.4% of the Northern Ireland system demand. To help
recover the system frequency, the power export to Scotland was reduced from 416 MW
to 250 MW by blocking 50% of the Moyle interconnector. Finally, most of the
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 56
customers were reconnected by 12:00, i.e. 98 minutes after operation of the System
Separation Detection Scheme.
Figure 2-12: Frequency change during Irish Disturbance on 5th August 2005 [10]
The performance of the remedial protection schemes in the formation of the Irish
system wide disturbance was then analysed. The primary cause of the system
disturbance was due to the incorrect detection of a separation of two Power Systems on
the island. This highlighted the importance of immunity to the interfering signals. It is
also recommended that the outputs form the Power Line Carrier on the out-of-service
circuits should be blocked to prevent receipt of spurious activation signals. To help
prevent this incident occurring again, the inter-trip signal latch has been removed from
Tandragee to Ballycronan More, which ensures a more secure communication and
prevents the second run-back signal for the Moyle Run-back scheme. The performance
of an under-frequency load-shedding scheme is still considered necessary in preventing
system collapse.
2.5.2. SIPS Maloperation in Nordic Grid, 1st of December 2005
Two system protection schemes are installed in Norway to deal with the challenges
brought by the high power generation from the northwest of the main bottleneck. A
brief introduction of the schemes is given:
1st Run-back
2nd Run-back
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 57
1) Nordland SIPS:
The grey-shaded area (northern Scandinavia) in Figure 2-13 contains nearly 15% of the
installed hydroelectric capacity (i.e. approximately 6000 MW) in the Nordic grid, but
has a low load demand. This leads to a large power transfer from north to the south,
where the main load centre is located (Oslo). The outage of any critical transmission
corridor may cause the overloading of other transmission lines. In this case, the
Nordland SIPS is designed to shed up to 1200 MW generation in the north, and split and
disconnect the northernmost part of Norway from the main Nordic grid if there is
surplus generation in the northernmost part of Norway.
2) Østland SIPS:
The eastern part of Norway around Oslo, shown in the yellow-shaded region in Figure
2-13, is the main load centre. The Østland SIPS is activated when there is an outage or
overload of the central lines in the Oslo area. In this case, it is designed to shed up to
1200 MW of generation on the west coast of Norway, protecting the remaining
transmission line in Oslo from overloading.
Before the incident, the Nordland scheme was armed due to the high hydroelectric
production at the northwest of Nordic Grid (2300 MW out of the area). Both network
split and generator tripping functions were activated. In addition, the Østland SIPS was
activated due to high power transfer from Norway to Sweden (2100 MW). At 15:02 on
the 1st of December 2005, a fault occurred on a 420 kV reactor and then the breaker
failed to open. The operation of a busbar differential protection at Porjus cleared the
fault. This also led to the outage of one main transmission circuit from Northern
Scandinavia through Sweden. The remaining power transfer corridors out of northern
Scandinavia then became overloaded, which means the operation of the Nordland SIPS
was required. However, the delayed operation of the Nordland SIPS led to a series of
cascading events. System operation outside the operational limits required the activation
of the second SIPS at Østland, which was expected to trip 1150 MW production on the
west coast of Norway. The operation of the Østland SIPS would cause a frequency drop
and eventually trigger the under-frequency load shedding of 2400 MW of load.
Fortunately, the Østland SIPS also failed to operate, prevented the breakdown of the
entire Nordel grid. Finally, the manual activation of 2000 MW fast reserve production
helped stabilize the system disturbance.
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 58
Figure 2-13: Nordic Grid and the Protection Schemes [11]
The latency in the Norland SIPS was due to the delay in the substation communication.
The failure of the Østland SIPS to operate was due to a human error after maintenance
testing, which saved the Nordic grid from collapse. The overload limit of the Østland
SIPS was increased after the incident to ensure it won’t trigger in the similar case. This
event indicated that in the SIPS design phase, system studies must be performed to
ensure the SIPS arming criteria is correctly designed. In Chapter 6, a method to
determine the arming point of a SIPS is provided by comparing the system risks with
and without SIPS. The Nordic SIPS maloperation also highlighted the importance of
reliable SIPS operation in preventing the spread of a system disturbance. In addition,
Østland SIPS
Norland SIPS
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 59
with increased number of SIPS in the system, the relationship between multiple SIPS
needs to be studied to mitigate undesirable interactions. A risk assessment method to
evaluate the impact of undesirable interactions between different SIPS will be discussed
in Chapter 7. In particular, SIPS with adaptive operational logic will be designed to
effectively mitigate the risk caused by SIPS interactions.
2.6. Summary
This Chapter describes the fundamental features of SIPS and the need to evaluate its
reliability. By reviewing the surveys and implementation guidelines, some critical
design considerations and the application of the latest technology for SIPS are described.
SIPS can be applied to provide corrective control actions for various abnormal system
conditions to preserve system integrity. It is therefore vitally important to ensure a
highly reliable SIPS performance in terms of dependability and security. In recent years,
the proliferation of SIPS and the increased incidents of SIPS maloperations both call for
an effective method to quantitatively assess the additional operational risks brought by a
SIPS implementation to the overall Power System.
The review of the existing SIPS applications in this chapter covers industry practices
and approaches to the use of new technologies for monitoring, communication and
control to further enhance SIPS performance. SIPS can be implemented either locally or
system-wide with hierarchical architecture and multi-level corrective actions. The
introduction of advanced information communication technology (ICT) and intelligent
electronic devices (IED) brings more flexibility in SIPS design and operation. By
reviewing SIPS related system disturbances, it can be seen that component hardware
failures are the most common cause of SIPS maloperations. However, in the future,
with the application of more ICT and IEDs in SIPS application, software is more likely
to become the main issues leading to SIPS failure. According to NERC, with 72% of the
recent SIPS maloperations caused by security-based maloperation, the implementation
of redundancy in SIPS communication network needs to be carefully assessed. The
method of using voting and vetoing scheme to balance the trade-off between scheme
dependability and security has been illustrated in the TMR designs. The extensive wide
area communication network, together with the GPS time synchronisation, made it
possible to centralise the decision making in the existing distributed SIPS and achieve
Chapter 2: Reliability of System Integrity Protection Scheme
Page | 60
better management and coordination. A more detailed description of the ICT used in
SIPS applications will be provided in Chapter 3.
Major SIPS related system disturbances illustrate the significant consequences of a SIPS
maloperation. In the pre-cascading phase, an effective and quick operation of SIPS is
vital in preventing the spread of the disturbance. Incorrect, delayed or failure of SIPS
operation increase the probability of the system entering the cascading phase and may
eventually lead to severe consequences such as load disconnection. The Ireland
disturbance indicated the importance of secure communication in successful operation
of SIPS. Redundancy in SIPS activation signals made available by duplicated
communication channels may adversely reduce its reliability and increase the risk
caused by spurious SIPS operation. Consequently, the implementation of redundancy in
SIPS design need to be assessed and appropriate voting or vetoing logics can be used to
enhance SIPS security. In addition, both of the reviewed system disturbances involve
the operation of more than one protection scheme, which highlighted the necessity of
not only assessing the reliability of individual SIPS performance but also understanding
the possible interactions between SIPS.
Page | 61
CHAPTER 3
ASSESSING THE IMAPCT OF ICT
RELIABILITY ON SIPS APPLICATION
3.1. The Role of ICT in Power System Protection
The effective and reliable provision of electrical power is increasingly reliant on
information and communication technology (ICT). The massive expansion of these
technologies in the Power System allows the system operator to respond to the danger
caused by an abnormal system condition in a more effective and timely manner. This
helps to prevent the propagation of large system disturbances [32] and has been widely
applied in the design of SIPS as introduced in Chapter 2. Moreover, the increasingly
varying operating conditions arising from the integration of significant renewable
sources and the highly interconnected network makes it extremely difficult to retain
system stability without advanced detecting, monitoring and visualization methods, as
enabled by ICT.
In addition, ICT brought more flexibility in the conventional protection function and it
significantly facilitates the implementation of wide area monitoring, protection, and
control (WAMPAC) system. This helps promote the development of novel protection
approaches, such as special protection schemes, real-time control of HVDC and FACTS,
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 62
and stability monitoring. The processes included in conventional protection and control
strategies are normally straightforward, especially because they do not include system-
wide supervisory control to monitor and regulate possible failures in the related fault
evolution process. Recent blackouts have also emphasised the limitation in the local
protection and the need to enhance system reliability by developing WAMPAC systems
[12, 13]. Consequently, the implementation of ICT and WAMPAC will continuously
help enhance the stability and efficiency of the future Power System.
3.1.1. Impact of ICT on Power System Protection
The role of conventional Power System protection is to disconnect the faulty or
overloaded elements from the rest of the electrical network. However, changes in Power
System are making it increasingly difficult to find one appropriate protection setting
that will suit all the different system conditions and operating contingencies. Meanwhile,
Power Systems are becoming more vulnerable to system wide disturbances and this
requires a coordinated wide area response across the entire system. The WAMPAC
system, made available by advances in ICT, is expected to improve the performance of
the protection system and especially the following aspects:
1) Managing wide area disturbance: the introduction of wide area monitoring in the
protection system improves the resilience of Power Systems against stressed
conditions and wide area disturbances. The increasing availability of real-time wide
area measurements has enabled SIPS to be applied with fast and adaptive protection
actions for system contingencies. Another example is the use of a WAM system to
supervise back-up protection (i.e. Zone3) within a distance relay; this was identified
as a main contributor to recent blackouts [12, 15]. The backup zones of distance
relays can be entered during extreme system conditions such as changing loads,
power swings and generator loss-of-field. The use of WAMPAC could improve the
performance of the backup protection by restraining the backup relays in the event
of a load swing. As illustrated in Figure 3-1, when the Zone-3 protection of relay A
is picked up, and significant negative sequence current is detected, the zone-3
pickup decision is appropriate and necessary. However, if the currents in the line are
balanced, the zone-3 pickup could be caused by either a balanced three-phase fault
or a load violation. In this case, the remote PMUs installed within the protection
zone of the backup relay will monitor the current and determine whether there is a
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 63
Zone-1 three-phase fault. If none of them detects the existence of the three phase
fault, the zone-3 trip of relay A must be caused by the load encroachment and the
trip decision will be declined.
Relay at A
BC
D
E
Zone 3 of Relay A
Zone 1 of Relay C
Zone 1 of Relay B
Zone 1 of Relay D
Zone 1 of Relay E
Relay A Zone 3 Trip
Relay at B
Relay at D
Relay at C
Relay at E
Zone1 Pickup?Yes
Block Relay A Zone 3
No
Supervision of Zone 3 of Relay A
Relay A Zone 3 Trip
Figure 3-1: Supervision of Backup Relays to Prevent Zone3 Maloperation
2) Mitigating the impact of hidden failures: A hidden failure is defined as a
permanent, undetected defect in a protection relay which causes a relay to operate
incorrectly and remove elements of the system as a consequence of another
switching event in the system [33]. The impact of a hidden failure can be tackled by
using ICT to collect measurement from multiple relays and use the information to
confirm or approve the trip decision. This could prevent a single hidden failure in
any one of these relays from causing an incorrect operation.
3) Adaptive relaying: The concept of adaptive relaying is that the protection devices
can automatically make adjustments to make them more attuned to prevailing
system conditions. An example of an adaptive relaying application is a design that
balances the dependability and security of a protection scheme. Power System
protection was traditionally designed with a bias towards dependability [34], which
could be beneficial in a system with a robust transmission network and sufficient
generation reserve. However, during a wide area disturbance, the erroneous loss of
an unfaulty element can be a major threat to a stressed system and can accelerate the
process leading to a cascading failure and even a blackout. Consequently, the
preference for dependability may result in inappropriate tripping operations and
bring greater risks to the system. Therefore, the shifting of the balance from
dependability towards security during stressful system conditions, as detected by the
wide-area monitoring system, could be attractive.
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 64
3.1.2. Impact of ICT on SIPS
A SIPS protects system security from the effects of extreme system contingencies and
wide area disturbances, which are beyond the scope of traditional protections. In recent
years, some more advanced SIPS, based on the real time wide area measurements made
available by WAMS, are proposed to protect the Power System from wide area
disturbances under various system conditions [35-37]. The advanced ICT enables real
time monitoring of system conditions and provides a more accurate state estimation to
facilitate the decision making of SIPS. With the phasor measurement units (PMU) and
the global positioning satellite (GPS) system, the time tagged measurements of Power
System quantities across the entire network can be collected to provide opportunities for
system wide SIPS control actions. The synchrophasors can be applied to solve system
stability problems such as oscillatory stability, voltage stability and transient stability.
In addition, the system event triggering can also be provided by the synchrophasors
based on the measurement of current, voltage, frequency and the rate of change of these
measurements.
There are some applications of using WAMPAC to enhance SIPS performance in the
industry. Some utilities use the WAMPAC system to centralise the existing standalone
SIPS to achieve better coordination and easier maintenance [28]. Others utilise real time
wide area measurement to more precisely predict the complex emergency operational
states and adaptively adjust to ensure a quick and decisive response [35, 36].
The building of an extensive telecommunication network and the continuously
developed communication protocols enables the exchange of information over Wide
Area Networks (WAN). For example, the GOOSE message, which was originally
intended for communication within a local area network (LAN) environment, is now
implemented for wide area protection and control applications. As described in IEC
61850-90-1 [38], a special router configuration is used to tunnel the GOOSE among
substations or between substations and the control centre. This protocol provides a
secure transfer of GOOSE across the WAN in an IEC 61850 format by using a special
router configuration to tunnel the GOOSE messages between the substations. The use of
routable-GOOSE (R-GOOSE) is an emerging solution to improve wide area Power
System monitoring, protection and control and achieve a centralised SIPS application.
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 65
3.2. Communication Infrastructure of SIPS
3.2.1. General SIPS Communication Infrastructures
An overview of the SIPS communication infrastructure, together with its reliability
considerations is first discussed in this section. SIPS can be classified into local SIPS
and system wide SIPS, based on its communication infrastructure as discussed in the
previous chapter. Most of the existing SIPS implementations are local schemes and are
distributed in isolated local substation environments with limited communication with
other parts of the system. Therefore, all its sensing, communication and decision
making devices are located in a single substation [14], making the protection action
highly reliant on the local substation automation system (SAS).
With the development of ICT, SIPS are now being designed to address system wide
contingencies, which require measurements from all over the network. In addition,
utilities are now facing more complexity in SIPS operation, following the increase in
the number of SIPS being installed in the power networks for hierarchical control
actions. The standalone nature and the widespread proliferation of the SIPS require an
extensive control and maintenance effort. This may also lead to a higher probability of
unintended SIPS interaction and may impede the ability to deploy additional SIPS into
the power network to enhance its capability and accommodate more renewables. All of
these challenges in the development of SIPS call for a breakthrough solution. This is to
centralise the current existing standalone SIPS to achieve better development,
coordination and maintenance. However, this breakthrough is based on advanced fast
detecting IEDs, fast computing processors, the availability of extensive Wide Area
Networks (WAN) and continuous development of communication protocols (e.g.
IEC61850, IEC62439, etc.).
The implementation of SIPS with a significant degree of centralisation using the WAN,
have been completed by some utilities. The PacifiCorp’s Jim Bridger RAS [26]
implements a dual triple modular redundant (TMR) system to initiate their region-wise
RAS applications. Information from the neighbouring RAS, located at the Idaho Power
Midpoint Substation, is centralised by the Jim Bridger RAS. The two neighbouring
SIPS system are designed to complement each other with intertripping and coordination.
A wide area SIPS was installed at the Salt River Project (SRP) system [39] to centralise
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 66
the measurements of generation outputs and implement the load shedding calculation
using GOOSE messages, a Virtual LAN (VLAN), and priority messaging technologies
enabled by IEC 61850. South California Edison (SCE) [28] is now developing a
centralised RAS (CRAS) which centralises the arming calculations and the RAS logic
over a wide area monitoring and protection network involving over 100 substations.
This platform establishes a platform for central controlling.
Substation #N (SIPS X)
Substation #1 (SIPS A)
Substation #1 (SIPS B)
MU
BIEDBIED
MU
Relay A
Relay B
Switch
Router
WAN
LAN
WAN
Centralized Control Center B
Centralized Control Center A
LAN
Controller A
Processor
A2
Processor
A1
Processor
A3
Gen Plant/Load Sub #N(SIPS X)
Gen Plant/Load Sub #1 (SIPS A)
Control
IEDs A
LAN
Gen Plant/Load Sub #1(SIPS B)
Router
Switch
Router
Control
IEDs B
BIEDBIED
Load or Generation
1. Monitoring and Detection
Line flow monitoring
Load level monitoring
Line outage detection
2. SIPS Central Controller
Arming Calculation & Logic checks
Mitigation Calculation
3. Mitigation
Generation/load level monitoring
Generator tripping
Load Shedding
Figure 3-2: General SIPS Architecture with Central Processors
To realize the centralisation of existing distributed and standalone SIPS, an extensive
high speed communication infrastructure between the local substations and the system
wide communication network is required. Figure 3-2 describes an overall
communication architecture of a SIPS installed for centralised control, with redundancy
applied to the monitoring devices, central controllers, mitigation relays and the
associated communication networks. The monitoring and detection IEDs are located at
substations spread over the entire network and used for different monitoring
applications. The monitor function involves the measurement of power flows on the
lines, and the voltages, frequencies, rate of change of frequency and when applicable
other parameters related to specific system conditions. The centralised SIPS controller
gathers all the monitoring and protection data via a wide area network (WAN) and
decides whether corrective action is required. The physical architecture of its
communication network is determined by the size of the scheme and the location of the
detection sites and the mitigation actions. Once the SIPS operation is required,
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 67
commands must be sent to the control IEDs in the field to initiate corrective actions (e.g.
generator tripping, load shedding, etc.).
3.2.2. Wide Area Communication Network
The WAN is used to gather all the information required for SIPS decision making from
the various detection sites and to communicate the control commands to the mitigation
devices at different locations in the system. The Synchronous Optical Network (SONET)
and Synchronous Digital Hierarchy (SDH) protocol based architectures are normally
used by utilities for the wide area communications between major substations in their
power network. SONET and SDH are standardized protocols that transfer multiple
digital bit streams synchronously over optical fibre. The fibre-based communication
system is normally built in a ring configuration as shown in Figure 3-3. Redundant
communication paths are provided by its bi-directional data ring topology. Data
exchange between WAN nodes mainly relies on the primary ring of the dual ring WAN.
Occasionally, upon losing the primary ring, SONET equipment can switch to the
backup data flow in as little as 4 milliseconds [40, 41].
SONET/SDH
Node 3
Node 4
Node 1
Node 2
Primary Ring
Backup Ring
FI 12
FI 23FI 34
FI 41
FI 14
FI 43 FI 32
FI 21
Control Centre
Substation #3
Substation #2
Substation #1
Figure 3-3: WAN SONET Architecture
The control centre for the system wide centralised SIPS, collects data from all the major
substations across the network and processes all the logic. Since the correct and timely
response of the centralised SIPS is critical to the stability and reliability of the large
scale wide area power networks, the central controller must be highly dependent. For
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 68
this reason, redundant inputs, outputs, processor units and telecommunication networks
must be implemented. This is because unexpected remedial actions caused by spurious
SIPS operations could involve significant costs. To balance the dependability with
security to achieve the optimal performance, dual triple modular redundant (TMR)
voting control systems are developed by utilities as shown in Figure 3-2. The two
controllers are installed in geographically separated locations and will backup each
other. In each central controller, there are three processors. Two of them must achieve
the same decision to initiate an operation. There are also Ethernet links between the two
systems for data exchange.
3.2.3. Substation Automation System
A Power System substation, as a key node in a power network, plays a vital role in
monitoring and controlling power flows and interconnecting generating facilities,
transmission and distribution networks and customers. Successful operations of both
local and system-wide SIPS are heavily reliant on the monitoring, communication and
control functions in the substation automation system. A substation consists of
numerous items of switchgear and measuring devices, and these are controlled,
supervised, and protected by the Substation Automation System (SAS). The main
features of the SAS are to [42]:
Control or monitor all the electrical equipment in a substation
Communicate to remote SCADA system
Control or monitor electrical equipment in a local bay
Monitor the status of all the connected substation automation equipment
Monitor the condition of substation electrical equipment (e.g. switchgear,
transformer, relays, etc.)
Manage the energy flows
The successful operation of SIPS is heavily reliant on the instrumentation, monitoring,
communication, control and protection systems used in the SAS. The measurements
required by SIPS are collected by substation based sensor IEDs and are then transmitted
to a station host computer. The data are then used for either local decision making or
sent via a WAN for centralised decision making. The communication infrastructure of
the SAS and its reliability are studied in the following sections.
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 69
The advent of IEC 61850 [43] significantly facilitates the communication services in a
substation and overcomes problems of interoperability between different devices. The
development of SIPS, which is one type of special protection and control system, will
be influenced by fast and highly reliable communications, as provided by an IEC 61850
based SAS. Data transmission in a substation, in accordance with IEC 61850, is based
on the necessary data model and communication services. The use of IEC 61850
significantly improves the reliability of the SAS by replacing a multitude of copper
wires with serial communication links (e.g. fibre optic) [44]. Successful
communications in SAS also rely on a reliable physical Substation Communication
Network (SCN) architecture. Redundancy in each communication layer is required to
eliminate single-point failures in the system.
Figure 3-4: Substation Automation Architecture from Hardwire to IEC 61850 [45]
3.2.4. Centralised SIPS: Speed Requirement
To achieve the centralisation of the distributed schemes, a high speed broad-band
communication network is required to deliver the measurement information and the
control signals. The time requirement for SIPS varies with different applications. For
SIPS designed to mitigate the overloading in transmission systems, the operating time
requirements can be several minutes. However, for SIPS designed to improve transient
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 70
stability, a timeframe of 100 milliseconds is normally required. Therefore, the overall
speed for the mitigation action must be fast enough to satisfy the application
implemented in the SIPS with the harshest time requirement. By reviewing the current
SIPS time requirements and stability studies applied to the most severe faults [14, 26],
the total time from the triggering event to the SIPS actions must not exceed 5 cycles.
Therefore, for a 50 Hz system, a timeframe of 100 ms was set as the time requirement
under the most severe conditions. A time allocation for the timeframe is illustrated in
Figure 3-5.
Normal Operational Conditions
Stability with robustness
DegradedConditions
Stability without robustness
Preventive actions
TriggeringEvent
SIPS actions (required timeframe)
Instability
Evolution of the Collapse
Monitoring relay processing: 16ms
Data transmission to controller: 19ms
Central Controller Processing: 15ms
Data transmission to mitigation device: 19ms
Mitigation relay processing: 5ms
Trip contacts: 25ms
Figure 3-5: Time Breakdown of a Time-Critical SIPS Application
The breakdown of the SIPS time frame includes 16 ms for the detection relays to detect
the fault, 15 ms for the decision making in the central controller and a total of 30 ms for
the mitigation relay to trigger remedial action [46]. A time interval of 19 ms is left for
the data transmission between the relay and the control centre or vice-versa. Great effort
has been made by utilities to test the speed performance of the wide area
communication network used for centralised SIPS. Testing results indicate that the
communication speed of the WAN is fast enough for SIPS applications. For example,
the SCE uses routable-GOOSE messages for wide area communication. The testing
results indicate a 19 ms time interval for bi-directional data transmission is sufficient for
data transporting over 660 miles communication network (far enough to cover most
remote locations of the SCE’s service territory) [46]. This leaves sufficient time margins
for other possible delays in the communication process.
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 71
The communication speed within a SAS is also critical in satisfying the speed
requirement of each SIPS application. By reviewing the IEEE standard communication
delivery time performance requirements for electric power substation automation [47],
the time requirements for different substation automation applications are listed in Table
3-1. Therefore, for the information with the highest time-critical level, a communication
time of ¼ cycle (i.e. 0.005s) can be achieved by the SAS. Consequently, it has been
proved that using an extensive high speed wide area communication network, together
with fast computing central controllers and high speed substation automation system
communication fulfils the time requirement for different SIPS applications.
Table 3-1: Grace Time for Substation Automation Systems
Applications Typical grace time
Uncritical automation applications, e.g. enterprise resource
planning, manufacturing execution 10s
Automation management, e.g. human interface, SCADA,
building automation, thermal 2s
General automation, e.g. process & manufacturing industry,
power plants 0.2s
Time-critical automations, e.g. synchronised drives, breaker
failure protection, back-up breaker tripping, etc. 0.005s
3.3. IEC 61850 based Substation Automation System and its Reliability
Model
A highly reliable, fast and deterministic communication network is vital for the
successful execution of a SIPS application. The penetration of ICT brings significant
changes in instrumentation, monitoring, communication, control and protection systems
in the Power System. With more hardware devices, software routines and user defined
settings, concerns about the reliability of digital communication dominated protection
and control systems have been raised. In particular, the application of the IEC 61850
substation automation system and microprocessor-based multi-function IEDs provide
more flexibility in system design and opens up a vast range of solutions for SAS
architecture [48]. Different SAS communication architectures, implemented in
accordance with various communication protocols, are reviewed in this section. A
method to quantitatively evaluate the reliability of different communication services in
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 72
the SAS is proposed. The method could help ensure the reliability levels required by
various SIPS applications are achieved.
3.3.1. IEC 61850 based Substation Station Bus Architectures
In general, SAS is a hierarchical structure comprising three levels, namely the station
level, the bay level and the process level. These three levels are connected using two
buses: the station bus and the process bus. The station bus facilitates the communication
between the protection, control and monitoring IEDs installed at the bay level with the
station level devices, such as the station computer with the human machine interface
(HMI) and gateway to the network communication centre; whilst the process bus
connects the bay units (i.e. protection and control devices) with the switchyard devices
(e.g. breakers, CTs and isolators, etc.). Multiple network redundancy protocols can be
implemented to enable communication network reconfiguration and self-healing of the
communication path in case of device or link failures.
3.3.1.1. Star & Ring Station Bus Architectures
Figure 3-6 shows two typical substation communication network (SCN) architectures:
star and ring. For the star architecture, a central station-level switch is used to connect
all the bay switches to the IEDs allocated in each bay. Ethernet switches provide a
common connection point for devices by storing incoming packages and forward them
to the specified destination on the LAN. In this case, the central switch becomes a single
point of failure for the whole SCN and thus significantly affects the reliability of the
station bus communication network. Communication redundancy can be achieved by
using the ring architecture, which involves forming a ring of switches. The Rapid
Spanning Tree Protocol (RSTP) as defined in IEEE 802.1w can be integrated into the
ring structure to prevent communication loops which may cause flooding due to data
duplication and recirculation. The RSTP protocol automatically readjusts to failure, by
sending data to its destination in the opposite direction upon detecting a break at one
point of the ring. This helps to achieve the so called “standby” or “dynamic”
redundancy. When the primary path failure is detected in the RSTP ring, the alternative
standby path needs to be switched into action within a certain amount of operating time,
which is called the reconfiguration time. A typical reconfiguration time of 2s can be
provided by the RSTP ring architecture. This is more rapid than the conventional
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 73
Spanning Tree Protocol (STP), which has an average switchover delay of 30sec in the
event of a failure [49].
Figure 3-6: Star (left) & Ring (right) Type SCN Architectures
Standby redundancy provided by STP and RSTP requires switchover time when the
primary path fails. However, the advent of the high-availability Seamless Ring (HSR)
protocol as standardised in IEC 62439 [50], bumpless redundancy for Ring topology
networks can be provided. A simple HSR network is indicated in Figure 3-7. A Doubly
Attached Node running HSR (DANH) simultaneously send duplicated multicast frames
(i.e. A & B frame) to the recipients on the network. If one of the communication paths
fails, the destination can still receive the signal from the other communication path
without any reconfiguration time.
Figure 3-7: Example of IEC 62439-3 HSR Network [42]
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 74
3.3.1.2. Parallel Redundancy Protocol based Station Bus Architectures
The concept of the Parallel Redundancy Protocol (PRP) as standarised in IEC 62439-3
[50] is to connect the IEDs with two separated and independant Local Area Networks
(LAN A & LAN B) and to simultaneously send the duplicated Ethernet packets through
these two networks. Consequently, if one data frame fails to reach the destination due to
traffic or network failure, the destination can still receive the required information from
the other network without any reconfiguration time, hence providing seamless
redundancy. Figure 3-8 shows the redundant double star and double ring station bus
architectures, implemented in accordance with the IEC 62439-3 PRP. Noting that the
IEDs which are Singly Attached Nodes (SAN) can be connected to the PRP networks
via a Redundancy Box (Redbox).
Figure 3-8: Redundant Double-Star (left) & Double-Ring (right) SCN Architectures
3.3.1.3. Reconfiguration Time for Common Redundancy Protocols
The switchover delay for different communication protocols are reviewed and listed in
Table 3-2. The reconfiguration time following the failure of the primary communication
path varies with different communication protocols. The STP and the RSTP can swtich
to the alternative communication path in a time range from 2 to 20 seconds, which is
acceptable for the data used for SCADA or HMI applications. However, it may not
fulfill the requirements for time-critical substation automation such as protection and
underfrequency load shedding. Communication path redundancy, provided by HSR and
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 75
PRP is ideal for all substation automation applications since 0 switch over time is
required.
Table 3-2: Reconfiguration Time for Common Redundancy Protocols
Protocol Description Typical recovery time
STP Spanning Tree Protocol 20 seconds
RSTP Rapid Spanning Tree Protocol 2 seconds
PRP Parrallel Redundancy Protocol 0 seconds
HSR High-availability Seamless Ring 0 seconds
3.3.2. IEC 61850-9-2 based Process Bus Architectures
Similar to the station bus, bay level and process level redundancy has to be considered
to eliminate all possible single point failures in SAS. To evaluate its reliability, the
components connected to the process bus need to be firstly determined. Each bay has its
dedicated IEDs executing the control and protection functions. The process bus collects
the digitalized voltage and current signals form the Merging Unit (MU) and the
instrument transformers (i.e. CT and VT), and then transfers them to the bay level IEDs.
The merging unit is the interface device used to transfer the analogue data from the
instrument transformer into sampled value streams for the substation IEDs. An IEC
61850 compatible IED is normally equipped with an internal clock for time stamping,
providing about 1ms accuracy. External time sources (TS) can be used to provide a
more accurate system-wide time synchronisation in compliance with IEEE 1588 [51].
Ethernet Switches (ESW) are active communication nodes connecting Ethernet
interfaces, which receive, process and forward the Ethernet packets to the specific ports.
Two process bus architectures based on IEC 61850 are considered here, both consider
the redundancy in bay level components as shown in Figure 3-9. Bay Protection Units
(BPU) are normally implemented redundantly (Main1 and Main2) in SAS due to its
critical function in fault detecting. In the Architecture (2) shown in Figure 3-9, the
process bus communication system and the connected devices are implemented
redundantly. Hence, each bay IED has its independent process bus communication
system. All the bay protection and control IEDs are assumed to be doubly attached
nodes (DANP) and therefore can be connected to either a single local area network
(LAN A, without the dashed part) or two independent LANs (LAN A & LAN B). Only
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 76
the level of redundancy in the process bus network is considered. Other possible
solutions with different topologies can be found in [52, 53].
Figure 3-9: Two Process Bus Sensor Network Architectures
3.3.3. Reliability Model of the Substation Automation System
There are a number of previous publications which assessed the reliability and
availability of the SAS using fault-tree analysis (FTA), reliability block diagram (RBD)
and tie-set methods [52, 53]. Prior to the application of IEC 61850 in a substation, the
FTA is frequently used to evaluate the reliability of the automation system [54]. The
application of IEC 61850 and other communication protocols brought more
redundancies in the communication routine, therefore, a combination of RBD and cut-
set method is normally applied to quantitatively assess the reliability and take account
of all the redundancy considerations in the reliability model. However, most of the SAS
reliability assessment studies focused on the evaluation of the reliability/availability of
entire SAS communication architecture instead of specific communication service. This
makes it difficult to use the reliability assessment results to evaluate the reliability of the
monitoring, protection and control applications, which are based on certain
communication service.
Consequently, the reliability of different SAS communication services is assessed in
this section. The reliability data is further used for SIPS risk assessment in Chapter 5
and Chapter 6. Reliability assessment method based on analytical reliability block
diagram (RBD) and stochastic Monte Carlo simulation is proposed in this chapter.
Instead of considering all the devices within the SAS, the communication path to
conduct different communication services in the IEC 61850 based digital substations is
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 77
studied. The RBD is firstly used to represent the logical connections of the components
needed for each communication service. The reliability assessment method introduced
in the following section is used to estimate the overall reliability of different
communication architectures.
In addition, repair plays a vital role in maintaining the availability of the communication
network. In a repairable system, a fault in an electronic component can be detected by
the self-monitoring system embedded in the IEDs. The faulty component can be either
fixed or replaced in a timely manner. However, not all the failures can be detected in a
timely manner, due to the cause of the fault. Consequently, the reliability indices of the
SAS are examined with and without consideration of the repair.
Since the SAS architecture consists of a combination of series and parallel subsystems,
the fundamental theoretical analysis of basic system structures consisting of two
components (series & parallel systems) is provided using the following analysis
procedure, see Figure 3-10. The reliability (i.e. probability that the system will be
operating during a specified time interval) and the availability (i.e. the probability that
the system is in the available state at a given time) of the substation automation system
in performing different communication services need to be assessed.
1 2
1
2
(a) (b)
Figure 3-10: Basic Two-Component System in (a) Series and (b) Parrallel
1) Non-repairable System Reliability Assessment
For non-repairable systems, the failure rate of the component is assumed to be a
constant value i . This means the repair rate is considered to be zero, which means once
the component enters its failure state, it can never return to the normal state. The failure
of a SAS component is approximated as an exponential distribution with a constant
failure rate ( i ). The probability of the component being in a reliable state during a time
interval t (i.e. ( )iR t ) can be calculated as:
( ) it
iR t e
(3-1)
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 78
1i
iMTTF (3-2)
where MTTFi is the mean time to failure of the component i.
Series System: Reliability of a non-repairable system consisting of two components
connecting in series is:
1 2( )t t
sysR t e e
(3-3)
Parallel System: Reliability of a non-repairable system consisting of two components
connecting in parallel is:
1 2( ) 1 (1 ) (1 )t t
sysR t e e
(3-4)
The mean time to failure (MTTF) for both non-repairable systems can be calculated as:
0( )sysMTTF R t dt
(3-5)
2) Repairable System Reliability Assessment
The Markov Model is carried out to represent different operation states in a repairable
system. For a system consisting of two fundamental components, there are four possible
states the system can exist as shown in Figure 3-11. ‘U’ and ‘D’ represent the
component up and down state respectively. The reliability of a series system and a
parallel system is assessed respectively.
Figure 3-11: 4-State Markov Model
Series System: In the case of a series system, State 1 represents the system up state
while all the other three states are the down states. The system failure rate is obtained by
adding all the transition rates from state 1 to the other three states:
1 2
1,2,3
sys i
i
(3-6)
1
sys
MTTF
(3-7)
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 79
Parallel System: For a parallel system, State 1, 2 and 3 represent the system up states
while State 4 is the system down state, since the failure of one component does affect
the successful operation of the entire system. The probability of being in a failure state
(fp ) and a success state ( sp ) for a parallel system can be estimated as:
1 2
1,2
if f f
i i i
p p p
(3-8)
1,2
1 1 is f
i i i
p p
(3-9)
The repair rate for the system can then be obtained by adding all the transition rates
departing the failure state (i.e. state 4):
1,2
sys i
i
(3-10)
During steady state, the transition rate to a success state is equivalent to the rate to
failure state. Due to this fact, the concept of equivalent transition rate can be used to
calculate system failure rate of the parallel system. The system transition frequency to
the failure state ff and the frequency to the success state fs can be obtained as:
1,21,2
if s f s i
ii i i
f f p
(3-11)
The failure rate of the parallel system equals:
f
sys
s
f
p (3-12)
Knowing the failure rates of both the series and parallel systems, the MTTF is the
reciprocal of the system failure rate:
1
sys
MTTF
(3-13)
The availability (Asys) of the entire system can be calculated as:
sys
MTTRA
MTTF MTTR
(3-14)
In a complex system with a combination of both series and parallel sub-systems, the
network reduction method can be used to merge the sub-systems and come up with the
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 80
reliability indices for the whole system. The reliability analysis, combined with
sensitive analysis, would help indicate the most reliable SCN architecture for different
communication applications. In addition, the most critical component in the
communication network which requires more inspection can be identified through this
evaluation.
An overall knowledge of the SAS physical layout, reliability data and maintenance
strategy is necessary in assessing the reliability of the SAS. Knowing the main
components in the SAS and its hierarchical topologies, eight SAS architectures are
proposed considering all the possible combinations of process bus and station bus
structures:
Arch1: Single star station bus & single process bus
Arch2: Single ring station bus & single process bus
Arch3: Double star station bus & single process bus
Arch4: Double ring station bus & single process bus
Arch5: Single star station bus & duplicated process bus
Arch6: Single ring station bus & duplicated process bus
Arch7: Double star station bus & duplicated process bus
Arch8: Double ring station bus & duplicated process bus
Of these, the first four SAS architectures are not implemented with redundant bay
communication networks; however, these are implemented and deployed in the last four
architectures. Moreover, Arch 7 and Arch 8 are fully redundant from the station bus
down to the process bus, which eliminates all the possible single-point failures in the
SAS.
3.3.4. Reliability Data
To objectively evaluate different SAS architectures, the reliability parameters of the
components need first to be agreed upon. Table 3-3 shows the Mean Time to Failure
(MTTF) and Mean Time to Repair (MTTR) values for all the components used in the
substation. The reliability data are based on the previously published documents and the
IEEE reliability standards [52, 53, 55, 56].
The bay IEDs and the external time sources (TS) are considered as relatively unreliable
devices due to the large number of hardware, software routines and settings contained
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 81
by them. In addition, the GPS time reference signals can be easily jammed, blocked or
interfered with. The reliability of the Ethernet switches (SW) depend on the number of
ports it employs. Consequently, the station switch for the ring architecture is more
reliable than the station switch for the star architecture, which requires more Ethernet
interfaces. The reliability and cost figures for the Ethernet Media (EM) also depend on
the geographic distribution (cable length). For the repairable system, it is assumed that
all the faulted devices can be detected and fixed or replaced within 24 hours. The
relative costs of the component are roughly estimated and are used to reflect variation in
the cost of SAS introduced by implementing different levels of redundancy.
Table 3-3: Substation Component Reliability Data
Devices MTTF
(years)
MTTR
(hours)
Relative
Cost
Bay P&C IEDs 100 24 10
MU 300 24 4
TS 100 24 4
IED SW 500 24 3
Bay SW 300 24 4
Station SW(star) 250 24 5
EM (bay level) 800 24 0.1
EM (Middle) 600 24 0.2
EM (Long) 400 24 0.4
3.4. Reliability Assessment of SAS Communication Services
The advent of IEC 61850 standardized the communication services within the
substation and therefore fulfils the interoperability requirements. Three main
applications are covered by the communication services according to IEC 61850:
Communication between a bay IED and a substation level client (HMI, NCC
gateway or substation host) e.g. control, reporting service.
Communication between different bay IEDs e.g. interlocking by Generic Object
Oriented Substation Event (GOOSE) message.
Transmitting digitized data from Merging Unit to IEDs and GOOSE message from
an IED to a Circuit Breaker.
Consequently, to evaluate the reliability of SAS communication architectures, it is
necessary to specify the studied communication services, which serve as the basis of
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 82
data transfer in SAS. In this section, the reliability assessment models for different
communication services are built, namely the two-terminal and multi-terminal
communication reliability assessment models. The evaluation is accomplished through
quantitatively assessing the reliability of the communication path using the described
reliability assessment method.
3.4.1. Reliability of Two-terminal Communication
The client-server service can be considered as a two-terminal communication mainly
between a bay unit and a station client (HMI, NCC gateway or substation host). One of
the most important communication services is the reporting function which transfers
information including measurements, targets and switchgear status from a bay unit (i.e.
server) to a station client. Data acquired by the substation level client can be used for
different applications such as energy management and wide area monitoring and control,
etc.
The reliability block diagrams (RBD) are used to describe the logical connections of
components needed to fulfil the reporting service for the eight proposed SCN
architectures. Examples of the communication path for the reporting service from a bay
unit to the substation client for different SCN architectures are shown in Figure 3-12.
TS 1
TS 1
MU EM SW
EM
EM
IED 1
IED 2Bay SW EM Station SW EM
EM
EM
(a) Arch1: Single star station bus & single process bus
TS 1
TS 1
MU EM SW
EM
EM
IED 1
IED 2
EM
EM
Bay SW
Primary Path
Station SW EMSecondary Path
(b) Arch2: Single ring station bus & single process bus
TS 1
TS 1
MU EM SW
EM
EM
IED 1
IED 2
EM
EM
Bay SW EM Station SW EM
Bay SW EM Station SW EM
(c) Arch3: Double star station bus & single process bus
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 83
TS 1
TS 1
MU EM SW
EM
EM
IED 1
IED 2
EM
EM
Bay SW
Primary Path
Station SW EMSecondary Path
Bay SW
Primary Path
Station SW EMSecondary Path
(d) Arch4: Double ring station bus & single process bus
TS 1
TS 1
MU EM SW EM
EM
IED 1
IED 2MU EM SW
EM
EM
Bay SW EM Station SW EM
(e) Arch5: Single star station bus & duplicated process bus
TS 1
TS 1
MU EM SW EM
EM
IED 1
IED 2MU EM SW
EM
EM
Bay SW
Primary Path
Station SW EMSecondary Path
(f) Arch6: Single ring station bus & duplicated process bus
TS 1
TS 1
MU EM SW EM
EM
IED 1
IED 2MU EM SW
EM
EM
Bay SW EM Station SW EM
Bay SW EM Station SW EM
(g) Arch7: Double star station bus & duplicated process bus
TS 1
TS 1
MU EM SW EM
EM
IED 1
IED 2MU EM SW
EM
EM
Bay SW
Primary Path
Station SW EMSecondary Path
Bay SW
Primary Path
Station SW EMSecondary Path
(h) Arch8: Double ring station bus & duplicated process bus
Figure 3-12: Reliability Block Diagram of different SAS Architectures for Reporting
Service
The components used to fulfil a function are put in series, while the duplicated
communication paths in LANs and process bus are put in parallel due to the seamless
redundancy provided. The Merging Unit of the local bay digitalizes all the current and
voltage samples and transmits them to the bay IEDs through the process bus in a time-
synchronised manner. The trip signals from the protection IEDs can then be further
transferred to the station client through the substation communication network and will
be sent to the National Control Centre (NCC) for various applications. Assuming the
failure of component is exponential distributed, the reliability of the SAS at a mission
time of 104 hours can be estimated using Equations (3-1) to (3-6). The MTTF (both with
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 84
and without considering repair) of performing a reporting service in different SAS
architectures is considered. The MTTF (without repair) is the statistical time until the
failure of SAS in performing the communication service without considering
component repair. The MTTF with repair means the statistical time until a second
component failure appears at the same time before the first fault is fixed and the entire
system is declared unavailable. These two reliability indices can then be calculated
using Equation (3-5) and Equation (3-13) respectively.
Table 3-4: Reliability Assessment Results for Reporting Service
Architecture Reliability
(%)
MTTF without
repair (years)
MTTF with
repair (years)
Arch 1 97.70 37.2 53.09
Arch 2 98.04 38.25 63.82
Arch 3 99.20 47.92 151.85
Arch 4 99.21 48.39 151.86
Arch 5 98.41 40.05 81.62
Arch 6 98.76 41.19 110.06
Arch 7 99.92 51.78 321274
Arch 8 99.93 52.31 364632
Figure 3-13: MTTF & Cost of Considered SCN Architectures
As can be observed in Figure 3-13, the SCN architectures with high overall reliability
and high relative cost are located at the top-right corner of the graph, whilst the SCN
architectures with low reliability and low relative cost are located at the bottom-left
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 85
corner. It can be concluded that introducing redundancy at both process level and station
level could enhance the reliability of the reporting service in different SAS architectures.
The application of RSTP/HSR ring LAN architecture delivers higher reliability due to
its inherent redundancy in the communication path. In addition, the PRP based double-
star and double-ring LAN station bus could significantly facilitate the transfer of the
monitored data from bay IEDs to the station clients. Nevertheless, the implementation
of duplicated process bus in each bay and the station bus will significantly increase the
cost of the SAS (e.g. Arch4 versus Arch8). In practice, it is necessary to assess the
system reliability with considering the actual cost of the equipment and its maintenance
cost.
A timely component repair could significantly increase the MTTF of the studied
communication service in all the SAS architectures. The enhancement in the
performance is especially obvious for the architectures with fully redundancy from the
station bus down to the process bus (i.e. Arch7 and Arch8). For example, by repairing
the defective component in the SAS, the MTTF increase remarkably from 52.31 to
364632 years. This emphasizes the importance of maintenance testing and self-
monitoring or self-testing functions deployed by the devices in maintaining system
reliability. In addition, this two-terminal communication reliability model can also be
applied to evaluate the reliability of bay switchgear monitoring and controlling, and
communication between two bay level IEDs.
3.4.2. Reliability of Multi-Terminal Communication
The application of Internet Group Management Protocol (IGMP) [57] allows
multicasting data (e.g. GOOSE message) to be filtered and then transferred only to
designated IEDs. Multicast communication plays a vital role in executing distributed
functions such as interlocking, auto-reclosing and breaker failure protection (BFP). This
type of application requires exchanges of time-critical multicast Ethernet frame from a
local bay to the multiple recipients allocated in different bays (usually more than two)
via LAN. Consequently, a multi-terminal communication model is developed to assess
the reliability of the signal path for multicast communication.
The breaker failure protection is taken as an example of substation multicast
communication and its reliability is studied in this section. Given the importance of the
Power System protection and considering the fact that the primary circuit breaker might
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 86
fail to operate, the breaker failure protection is often implemented to enhance the
dependability of the protection system. To assess the reliability of the BFP, different
substation arrangements need to be considered as illustrated in Figure 3-14. When a
fault occurs on the transmission line between station B and C, if breaker 3 fails to clear
the fault, BFP fault clearing requires the tripping of different circuit breakers for each
arrangement. Consequently, different consequences on system integrity will be caused.
Figure 3-14: Breaker Failure Protection for Different Station Arrangements: (a) Single
Bus at Station B. (b) Ring Bus at Station B. (c) Breaker-and-a-half at Station B.
For the single bus arrangement as shown in Figure 3-14 (a), a failure of breaker 3
requires the tripping of all the breakers connected to Bus B (e.g. Breaker 2, 5 and 7) to
isolate the fault. This will split the system at bus B. For the ring bus arrangement as
shown in Figure 3-14 (b), the control logic requires the tripping of breaker 3 and 5 to
clear the fault. The misoperation of the breaker 3 requires the BFP function to trip
breaker 2. This leaves the fault connected to line AB and thus requires the tripping of
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 87
the breaker 1 by remote backup. Consequently, the transmission line AB will be left
out-of-service due to the BFP. For breaker and a half arrangement as shown in Figure 3-
14 (c), the BFP must trip breaker 2 when breaker 3 fails to operate. This arrangement
allows the system to clear the fault and at the same time keep all the other lines
connected to Bus B remain in service.
To achieve the BFP application, the local bay where the BFP function resides must send
trip signals to other relative bays via LAN to successfully execute the backup protection
function. The impact of BFP on system integrity is affected by the station arrangement.
However, from the secondary SAS point of view, probability of successful execution of
BFP function is only affected by the number of breakers required to be tripped.
Assuming a number of N circuit breakers need to be tripped by BFP to clear the fault,
the multi-terminal communication path for executing the BFP function in a double-star
LAN is shown in Figure 3-15. Redundancy in the station level LANs provides parallel
communication paths for the multicast Ethernet packets from the local bay to the
destined bays via either LAN A or LAN B.
Figure 3-15: Communication Path of Arch7 for Distributed Function
As indicated in Figure 3-14, three relative breakers in Substation B need to be tripped
by the BFP function for the single bus station arrangement. While there is only one
breaker required to be tripped for the other two station arrangements in case of breaker
CB3 failure. The reliability block diagram becomes insufficient in evaluating the
multicasting communication due to the complex multi-terminal communication path.
Therefore, a stochastic Monte Carlo simulation based method is carried out to calculate
the reliability of the studied communication service. The reliability of each component
can be calculated using Equation (3-1) assuming the reliability of the component is
exponential distributed. The reliability at a mission time of 104 hours is studied.
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 88
Reliability for peer-to-peer communications with a number of recipients of 1, 2 and 3
were assessed respectively and the results are shown in Table 3-5 and Figure 3-16.
Based on the simulation results, the reliability of executing a distributed function in a
SAS decreases with the increase in the number of recipients. Therefore, the ring and
breaker-and-a-half station arrangements, which require only one back-up breaker to
execute the BFP function, could provide higher reliability as compared with the single
bus station arrangement (3 back-up breakers).
Table 3-5: Reliability Data for Conducting Distributed Functions
SAS Arch Reliability (%)
1 Recipient 2 Recipients 3 Recipients
Arch 1 95.880 93.767 91.756
Arch 2 96.791 95.003 93.254
Arch 3 97.825 96.490 95.113
Arch 4 97.858 96.502 95.170
Arch 5 97.843 96.974 96.152
Arch 6 98.838 98.252 97.682
Arch 7 99.830 99.747 99.641
Arch 8 99.866 99.791 99.708
Figure 3-16: Reliability of SAS to Perform Multi-Terminal Communications
Similar with a two-terminal communication in a substation, implementation of
redundant process bus and station bus significantly enhance the performance of a peer-
to-peer communication. The improvement in reliability is especially obvious when there
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Arch 1 Arch 2 Arch 3 Arch 4 Arch 5 Arch 6 Arch 7 Arch 8
Re
liab
ility
SCN Architectures
1 Recipient 2 Recipients 3 Recipients
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 89
are more recipients in the multi-terminal communication path. Single star LAN
architecture (Arch 1&5) may not be sufficient for the distribution function since its
station switch is a single-point-failure and greatly compromises the reliability of system.
This may also cause additional latency for the time-critical communication due to the
communication traffic at the central station switch. The fully redundant architectures
(Arch 7&8) have the best performance among all the architectures with a reliability of
99.748% and 99.793% respectively. When the communication network of the SAS is
fully redundant, the increase in recipients will result in a lower reduction in system
reliability.
Only two process bus architectures with different levels of redundancy are considered in
this paper. In addition, the breaker IEDs could be directly connected to the bay switches
instead of via process bus. In this case, a better performance in both reliability and
latency can be achieved since the number of components in the communication path is
reduced. However, this requires the breaker IED to have to incorporate the BFP related
logic.
3.4.3. Sensitivity Analysis
Due to the high uncertainty of the data used in the reliability assessment, sensitivity
analysis is carried out to determine the impact of the variation in the assumed data on
the risk evaluation results. In addition, sensitivity study can be used to identify the
weakest and most critical components in the system. This could help improve the
overall performance and allow the allocation of enhanced inspection and maintenance
on the critical component. Two methods are used in the sensitivity study:
Risk Reduction Worth (RRW): the RRW index of a component i is the percentage of
variation in the unreliability by making the examined component perfect (λi=0), whilst
keeping all the failure rate of other components at their original value.
( )( )
( | 0)
sys base
sys base i
RRRW i
R
(3-15)
Wide Range Method: the reliability data of each component is changed over a wide
range to examine the impact of component’s reliability on the overall system reliability.
This method helps identify the component which has the most significant impact on the
reliability of the overall system.
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 90
The RRW index of each SAS component in performing the reporting service is
calculated. Table 3-6 shows the RRW indices of components in Arch1 and Arch8,
which are the two SAS architectures with the lowest and the highest reliability
respectively.
Table 3-6: RRW of each component in Arch 1&8
Arch EM SW TS MU IEDs
Arch 1 1.58 1.83 1.01 1.19 1.01
Arch 8 1.39 1.44 1.24 1.25 1.93
A higher RRW index indicates that the component has a greater impact on the overall
system reliability. For Arch1, the highest overall reliability enhancement is achieved by
improving the reliability of the station switch in the Star LAN, which is the single-
point-of-failure in the communication system. The dominating impact of the central
switch on SAS reliability can be effectively reduced by introducing ring LAN
architecture as shown in Arch 8. In that case, the IEDs, which were considered as the
least reliable devices, have the greatest impact on the overall reliability (RRWIED=1.93).
Therefore, it can be concluded that the importance of a device depends on its reliability,
quantity, the location in the system and the overall communication architecture.
Figure 3-17: Impact of MTTF on System Unreliability for Arch 1
Figure 3-17 shows the variation in system reliability when the Wide Range Method is
applied on the assessment results of Arch1. The MTTF index of the SAS components is
changed over a wide range from 0.1 to 5 times of its original value (MTTFbase). In
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 91
general, the reliability of the communication service increases as the MTTF of each
component increase. Similar with the RRW index, it can be observed that the reliability
of Arch1 is most sensitive to the Station Ethernet Switch. Increasing the reliability of
the sensitive component leads to the greatest enhancement in system reliability, whilst a
decrease in its reliability will in contrary significantly compromise the overall reliability.
Devices like the Ethernet Media (EM), although being highly reliable, still have a high
impact on system reliability for both architectures due to the large number applied, as
compared with other components.
3.5. Summary
ICT infrastructure plays a vital role in the economical and reliable operation of Power
Systems. It also helps to improve the resilience of Power Systems against stressed
conditions and wide area disturbances. This chapter provides an overview of the
information communication technology used in SIPS application. A detailed description
of the communication architecture from the perspective of the wide-area communication
network down to the IEC 61850 based substation automation system is provided. It is
vitally important to effectively assess the reliability of the communication architecture
to ensure a successful SIPS operation.
The IEC 61850 based substation is a node of SIPS which collects measurements and
implements control actions. A reliability assessment method based on both analytical
and stochastic methods are developed in this chapter to quantitatively assess the
reliability of various communication services in a SAS. Redundancy in the station bus
and process bus communication network is implemented in accordance with specific
protocols.
It is proved that the RSTP based ring station bus architecture is a reliable and cost
effective solution to improve the performance of substation communication. It provides
a significantly enhanced reliability in performing distributed function which requires
multi-casting communication. In contrast, the reliability of the single star LAN
architecture may not be sufficient for the distribution functions since the station switch
is a single point failure and may greatly compromise the reliability of the system and
cause additional latency for the time-critical communication. The implementation of
IEC 62439-3 PRP based duplicated station bus (i.e. double star and double ring) is an
Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application
Page | 92
effective solution in fulfilling the availability and performance requirements of the
communications in SAS. Redundancy in the process and bay level components also
significantly improves the reliability of the substation communication. The duplicated
process buses, implemented in accordance to the IEC 61850-9-2 protocol, introduces
additional communication path between bay protection and control IEDs and process
level devices. Since redundancy has always been regarded as an expensive reliability
enhancement method, it should thus be implemented only to the mission-critical
component. In addition, repair has a vital role in maintaining the reliability of the
substation automation system. If the defective component can be fixed or replaced in a
timely manner, the MTTF and availability of the system could be significantly increased.
The sensitivity study could help identify the most critical device in terms of maintaining
the reliability of different substation communication architectures. More maintenance
and inspection effort could then be allocated on these critical devices. The use of the
wide range method and RRW in sensitivity analysis indicates that the impact of a
component on the reliability of the SAS depends on the component’s position in the
system, its reliability and the quantity used.
Page | 93
CHAPTER 4
PROTECTION AND CONTROL ASSET
END-OF-LIFE ANALYSIS
4.1. Introduction
The previous chapters discussed the application of advanced information and
communication technology (ICT) and raised concerns about their impact on reliability
of numerical protection systems. However, the ageing of protection equipment is
another major challenge faced by utilities. For example, protection equipment based on
the IEC 61850 protocol has not been widely applied on the UK transmission network.
UK National Grid has approximately 1,200 circuit bays associated with its main
interconnecting transmission lines. These bays predominantly utilize electronic based
protection equipment (i.e. analogue or early numerical relays) to detect and clear short-
circuit faults. A significant number of these protection devices are now reaching their
design lifetime, and consequently the protection and control systems are expected to
become less reliable, with an increased number of ageing related failures. This could
potentially lead to a degraded system performance, since the aged protection devices
may not be able to provide the effective measurements required for emergency control
during a system disturbance.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 94
Due to the critical fault clearance function of protective relays, they must be maintained
in the most reliable state, and must be replaced before they show a pattern of
maloperation, indicating the end of life has already been reached. Consequently, it is
critical for UK National Grid to effectively assess the operational condition of these
relays and predict their expected reliable service life. The current anticipated life of the
protection and control asset and National Grid replacement policy is based on the
manufacturers’ information, on operational experience with prior generations of similar
equipment, and from generic industry observations. This is known as the “manufacture
defined end-of-life”. However, this may not reflect the actual relay end-of-life since it
does not include actual operating and environmental conditions of the equipment.
Therefore, it is critical to evaluate whether the replacement ages specified in the current
policy reflect the actual relay end-of-life and yield the best predicted reliability of
service and use of National Grid resources.
According to the IEEE Power System Relaying and Control Committee (PSRC) [58],
the end of expected life of a protection, control or metering device (i.e. device actual
end-of-life) is determined as a time in its lifecycle when any of the following stages are
reached:
1) The device is not able to perform as per its design specification and it is not
possible to repair.
2) The device has less technical support (parts, spares and expertise) due to product
obsolescence and the cost of repair outweighs the benefits of a newer device.
3) The device is no longer useful and no longer meets present functional requirements.
The end of expected life is determined not only by identified deterioration in condition
or performance, but also by the reduced availability of technical support (parts, spares
and expertise) due to product obsolescence. The useful life is described by IEC as “the
time interval beginning at a given moment in time, and ending when the failure intensity
becomes unacceptable or when the item is considered to be unrepairable as a result of a
fault (IEV 191-19-06).” Consequently, if the device has enough technical support or
sufficient spares are available and is able to meet the required functions, the end-of-life
of the device is when its failure intensity becomes unacceptable.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 95
Reliability and lifetime assessment techniques have historically been based on statistical
failure rate models obtained from historical field failure rates. One commonly used
method to predict the end-of-life is the “bathtub curve” as shown in Figure 4-1, where
component failures were recorded based on their age when failure occurs. The high
infant mortality in the first stage of the bathtub curve is caused by the defects designed
or built into the product. Product failures are those that randomly occurred in its useful
life period. This is the period where the failure rate stays constant and relatively low.
Failures in this stage are not ageing related. However, the failure rate is expected to
increase dramatically once the product exceeds the reliable service life and enters the
end-of-life stage.
Figure 4-1: Bathtub Curve for End-of-life Assessment
The Protection and Control Asset Life Extension (ALE) project carried out by UK
National Grid is introduced in this chapter. The aim of the project is to identify the
critical life-limiting elements within electronic protection devices and to establish
assessment and testing criteria to determine the deterioration mechanisms and rates,
with a goal of determining if “apparently-reliable” electronic relays could sustain good
performance for more years than their previously defined useful life.
4.1.1. Literature Review on End-of-Life Assessment
Lifetime assessment becomes increasingly crucial in the reliable operation of the Power
System. Considerable efforts have been devoted to evaluate the useful lifetime for
Power System components including transformers, cables, breakers, capacitors, reactors,
etc. In addition, it has been emphasized the importance of incorporation of ageing
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 96
related failures in system reliability evaluation, which could facilitate the decision
making in areas such as transmission development planning, transmission operation
planning, selection of substation configurations and reliability-centred maintenance [59].
Currently, most of lifetime assessments in Power System have focused on the
components on the primary system. Ageing related failures of the primary equipment
are mainly caused due to design defects or heavy loading. Different from the primary
equipment, a protection relay does not get hot, nor suffer failures related to the number
of faults (except for the input CTs). With an increasingly important role of protection
and control devices in preserve system reliability and more devices approaching their
end-of-life stage, a process to evaluate the operational condition of the devices and to
estimate their reliable service lifetime is required. The existing methodologies used to
determine the end-of-useful life for the protection and control devices are reviewed in
this section:
1) PSRC Asset Health Index for Protection and Control Devices
A method to assess the health condition of the protection and control devices was
proposed by the PSRC working group. An asset health index (AHI) was used to
estimate the protection lifetime. The following factors which may affect the reliable
service time of the devices were considered [58]:
a) F (Manufacture): The factors that could affect the product end-of-life from
manufactures’ perspective include the viability of the manufacture (i.e. the
likelihood of manufacture existing in the future), past performance experience
of the manufacture (e.g. response to issues, quality control, turnaround time for
repairs), the technical support, spares available, upgrade supports from the
manufacture and the performance of the manufacture’s similar products.
b) F (Performance): Historical performance of the device or devices with similar
characteristics in terms of reliability is examined. The Mean Time before
Failure (MTBF), the observed performance during routine testing, the number
of maloperations and unscheduled maintenance and the self-reporting failures
can be used to indicate the condition of the device.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 97
c) F (Utility): Factors from utilities which could impact end-of-useful life include:
the financial policy of the utility, the future direction of the company in terms
of other related areas (e.g. the replacement of RTUs, breakers, control IEDs,
etc.), the staff and resources to support the products and the utility’s operating
requirements to reduce the number of outages.
d) F (Industry): Industry experience, trends in device longevity, anticipated
standards under development and monitor performance will continuously
impact the end-of-useful life of a device. For example, the increase in the
application of Ethernet communication and the advent of IEC 61850
communication protocol may encourage the replacement of conventional
devices before their expected end-of-useful life.
e) F (Device): The non-performance based factors from the device itself may also
have influence on product life time, such as the redundancy of the device in the
system, vulnerable components in the device (e.g. electrolytic capacitor) and
the environment (e.g. temperature, humidity, etc.).
An equation can be developed to integrate all the above factors and quantify the weight
of each factor based on its importance on the actual performance of the device:
1 2 3
4 5
( ) [ ( ) ( ) ( )
( ) ( )]
F derating F Manufacture F Performance F Utility
F Industry F Device
(4-1)
where F(derating) is the end-of-useful life de-rating factor, δ is the overall importance
of end-of-useful life. The other factors (i.e. F(Manufacture), F(Performance), F(Utility),
F(Industry) and F(Device)) are defined as shown in the previous paragraph. An
estimated useful life time would be the designed expected life of the device multiplied
by (1- F(derating)). The equations and parameters used in the AHI based method
provide one way of quantifying the end-of useful life and planning for capital
investment and equipment replacement, but are not scientifically determined. It did not
provide a method to assess the operation conditions of the protection devices and
identify the ageing related degradations within the relay component. In addition, spares
of protection relays may not be important unless it is likely to fail. For example, a
NOKIA phone bought in the 1990’s would probably still work in 2017 (if batteries were
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 98
replaced). However, the manufacturer probably stopped making the spares in late
1990’s. Hence except changing batteries nothing is reparable. Consequently, the
identification of potential vulnerable components is vitally important in protection
lifetime assessment.
2) Statistical Analysis
The conventional method for asset lifetime prediction is statistical analysis where
probabilistic distribution functions are formulated based on the historical performance.
The developed probabilistic function can be either parametric or non-parametric [60].
For parametric statistical analysis, the reliability data of the product is fitted into a
proper distribution function such as exponential distribution, normal distribution,
Weibull distribution and etc. The parameters of the distribution function are estimated
based on the data, and the suitability of the model is checked using a goodness-of-fit
test. The non-parametric method is used when no predefined distribution function can
be used to characterise the data. This method has been introduced in [61]. The statistical
analysis method has been used to model both age related repairable failures [62, 63] and
age related end-of-life failures [64, 65].
3) Sample Testing
Another approach to assess the product end-of-life is through sample testing on specific
devices. Samples with different service history can be tested and compared to check
whether there is any degradation in the complete relay or a component function. Both
the system level overall product performance and the component condition can be
checked to determine if a device can continuously function for some additional to-be-
determined period of time. Compared to the uncertainty of the statistical analysis
method, the sample test method identifies the root cause of potential failures and
correlates these with physical wear-out and failure mechanisms, thus delivering a more
meaningful and accurate lifetime prediction. However, a clear understanding of the
operation conditions and degradation mechanism of the vulnerable components is
required.
If a sample with age related failure is available, failure analysis can be performed on the
aged product to determine the root cause of the age related failure. However, it is
important to filter protection maloperations caused by ageing from other types of
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 99
failures. A detailed protection performance record could also be useful in failure
analysis.
4.1.2. Asset Life Extension (ALE) Project Test Process
The previously discussed methods have some limitations in predicting the reliable
service life of protection devices. The asset health index (AHI) method requires a lot of
information from different aspects and it is difficult to precisely determine the weight of
each factor on the overall relay performance. In addition, the assessment of asset
lifetime should be undertaken during the useful life period of the equipment, which is
before the significant appearance of ageing related failures. Due to the extreme reliable
performance of protection devices, statistically significant failure data for this specific
population of equipment will not be available before the end-of-life evidence actually
appears. Consequently, a generic asset end-of-life investigative process which consists
of statistical analysis, functional testing and invasive examination is proposed in this
chapter to validate or forecast reliable service life of particular relay type. As shown in
Figure 4-3, the process steps are as follows:
1) Field Performance: The first step is to review the historical records for each
specific relay type to identify any recorded hardware problems. Depending on the
documents available, these records could include relay population, age profile,
maloperation history and causes, failure and repair history, and reports of benchmark
experience from other utilities.
2) Physical Inspection: Next, the conditions of relay samples removed from service
are examined. The disassembled relay modules are inspected visually to check for
cracks, loose or damaged interconnections, heat damage, or signs of corrosion or
contamination. Any components whose industry history, or as-found condition, makes
them targets for further evaluation are listed.
3) Fingerprint Testing: The operational behaviours of the relays are tested to check
whether there is any degradation in functional performances as compared to design
specifications. The operating characteristics are also compared with those of
contemporary replacement relay types to determine if new relays would offer a
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 100
meaningful performance improvement that could influence the replacement decision
process.
4) Stress Testing: Test the relay input current transformers (CTs) under simulated in-
service heavy load and fault conditions, looking for thermal stress that could impact life.
In addition, measure the voltage stress on voltage-rated components in the power supply.
5) In-Depth Component Evaluation: The temperature of components within energized
modules is characterized using thermal imaging and non-destructive structural
evaluation techniques designed to identify any potential life-limiting conditions. Hot
components are compared with their rated capabilities, and with electronic product
industry experience related to levels of heating versus reliability impact. Components
requiring further investigation, based on thermal imaging or any other observations, are
examined using three-dimensional x-ray tomographic micro-imaging. These imaging
results show any signs of degradation or wear-out, leading to determination of whether
stressed components are still sound, and whether a specific life extension can be
forecast.
6) FMMEA: Perform failure mode, mechanism and effect analysis (FMMEA) with
regard to the function of each studied component in its relay module and in the overall
operation of the protective relay. The purpose is to identify particular modules or
components most likely to cause a problem with the correct operation of a protection
relay.
7) Conclusions: Analyse the results of all evaluations, to determine if the end of life
replacement requirement dictated in regulatory policy for each relay type can be
extended by five or more years. A life extension recommendation includes any
modification or component replacement action, and a targeted procedure for rechecking
the condition of stressed components after additional years of service.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 101
Field PerformanceReview relay population, age profile,
maloperations and repair history.
Physical InspectionInspect overall product, modules and
construction, and components.
Stress TestingTest components that might have
induced thermal stress or voltage stress.
Fingerprint TestingTest the operational behaviours versus
specifications and newer products.
FMMEADetermine impact of failure prone components on relay operation.
In Depth Component EvaluationUse thermal and 3D X-ray imaging to
check component degradation.
ConclusionsRecommend a specific life extension for a type of relay and actions for components.
Figure 4-2: ALE Project Investigation Process
4.1.3. Benefits and Risks of Asset Life Extension
The premature replacement of relays before they reach their end of reliable service life
may not be an inherently better plan. Apart from the cost of equipment and system
outages, the newly installed relays will go through the infant mortality stage and must
be commissioned and debugged. In addition, modern microprocessor relays types do not
necessarily offer longer service life prospects as compared to the previous generations.
If a scientific investigation process can be developed and the validation of additional
asset life extension can be confirmed, the following benefits could be achieved:
2) Maintain protection reliability without investment in the installation of new relays
and protection schemes.
3) Reduce outages caused by protection replacement and enhance system reliability:
the extension of protection could effectively reduce the frequency of system
interruption caused by protection replacement. With respect to “N-1” reliability
criteria, when an important feeder is being upgraded due to protection replacement,
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 102
outage of other transmission lines could lead to severe system conditions and even
cascade tripping.
4) Avoid increase in economic cost of energy due to protection replacement: the
replacement of protection could lead to outage of important generation plant. For
example, when the protection is being upgraded on line to important nuclear station,
the output of the nuclear station needs to be rescheduled to more expensive
generators (e.g. gas generation), which leads to increased energy cost.
5) Avoid infant failures associated with new replacement.
6) Avoid application risks of new products including managing skills and resources.
7) Defer capital investment, resources for other need.
However, the following potential risks might be caused by extending the service life of
the existing relays:
1) Potential increase in end-of-life failures.
2) Create accumulated replacement problems within short windows.
3) Require resources and skills to manage ageing equipment.
These risks can be effectively mitigated by carrying out asset life assessment, setting up
pro-active asset replacement strategy and succession planning and training.
4.2. UK National Grid Asset Life Extension Project
4.2.1. National Grid Protection and Control Asset Life Extension (ALE) Project
National Grid has approximately 1,200 circuit bays associated with its main
interconnecting transmission lines. These bays predominantly utilize electronic
(analogue or numeric) based protection equipment (protection relays) to detect and clear
faults or short circuits. The application of the current replace policy will result in one
third of the protection equipment, in the 1,200 circuit bays, being replaced within the
next 8 years. This will lead to significant circuit outages, equipment purchase,
installation costs, and human resource requirements. However, since most of these
relays are still operating reliably without any sign of degradation, it is critical to
evaluate whether the replacement ages specified in the current policy yield the best
predicted reliability of service and deliver efficient use of National Grid resources.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 103
Therefore, understanding the life-limiting characteristic of these relay types becomes
important for National Grid to optimize its replacement plans. The objective is to ensure
National Grid neither replace these protection devices too early, with unnecessary use of
resources, expenditure and system outages; or neither too late with increased risk of
ageing related failures, maloperations and unmanageable waves of replacement. To
achieve this, a scientific investigation process is required to establish the ageing
mechanisms applicable to the specific protection types. This includes function tests and
invasive examination to determine the deterioration mechanisms.
The National Grid Protection and Control Asset Life Extension project comprises the
following major tasks:
1) Development of detailed scope, schedule, and information gathering process.
2) Develop processes and procedures for asset life extension evaluation.
3) Perform tests and investigations according to Task 2 processes and procedures;
document raw test results from university test laboratories for studied relay type.
4) Investigate the operational behaviour of each studied relay type; identify life
limiting elements; determine whether reliable service life can be extended and issue
action plans.
5) Document processes and procedures for asset life extension evaluation developed
and undertaken by the project team.
4.2.2. Asset Life Extension (ALE) Study of Selected Protection Relays
The objective of the ALE study is to investigate the operational behaviour of three
specific types of protection relays (SHNB, THR and LFCB) and determine if end-of-life
failures have started to occur or might occur in the near future, or if equipment
deterioration is becoming apparent. If the evaluation results indicate no end-of-life
failures or age related deterioration has occurred, then the results can be used to justify
extension of asset life expectation for the considered equipment types.
Table 4-1 shows the current UK National Grid policy on the reliable service lifetime of
different protection generations. Previous work has been established by National Grid to
assess the reliable lifetime of various electromechanical relays. This enabled National
Grid to review the pre-determined reliable service time of the protection devices and
revise its replacement plan [66]. The anticipated lifetime of electromechanical relays
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 104
has been successfully extended from 30 years to 40 years. In this project, this research
has been extended to cover more complex electronic relays, especially certain models of
multifunctional analogue solid state protective relays which have served reliably to date
but nonetheless are approaching their presently-rated end-of-life date.
Table 4-1: UK National Grid Policy on Relay Lifetime
Relay
Generation Equipment Family
Anticipated Asset
Lifetime (years)
Replace
window
Electro-
mechanical
Use mechanical force to operate a
relay contact in response to a stimulus. 40 -
Electronic Complex relays using transistorised or
integrated circuits 25 20-35
Numerical Digital (A/D converter and
microprocessor) 20 10-25
These considered equipment ranges from electronic equipment designed in the early
1980’s comprising transistorised circuits, early semiconductors and integrated circuits
to later equipment from the 1990’s that contains numeric components including
microprocessors and analogue to digital converters. A detailed description of the three
studied relay types is introduced as follows:
1) The Alstom SHNB distance relay: The SHNB Micromho static distance protection
relay manufactured by Alstom, GEC Alstom in Stafford (now part of GE). It is
designed to provide high speed phase and earth fault protection for high voltage or
extra high voltage overhead transmission lines. SHNB was designed based on
operational amplifiers, uncommitted logic arrays and was mainly installed from
1985 to 1995. 156 units remain in service with National Grid.
2) The Reyrolle THR distance relay: The type THR is a multi-zone distance relay
manufactured by NEI Reyrolle in Hebburn (now part of Siemens) and was installed
in National Grid substations between 1980 and 1990. THR is based on early-1970s
discrete transistor circuit design. 184 units remain in service with National Grid.
3) The Alstom LFCB differential relay: LFCB is a transmission line current-
differential relaying system based on microprocessor technology with analogue to
digital converters, manufactured by GEC, Alstom, or Areva in Stafford (now GE)
and supplied to National Grid between 1993 and 2005. 213 units remain in service
with National Grid.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 105
The on-site population, installation time and expected lifetime of each relay type are
shown in Table 4-2. The two analogue distance protection relays (i.e. SHNB, THR)
were mainly installed from 1980 to 2000, whilst the early numerical relay LFCB was
first introduced in early 1990. National Grid Policy Statement (Transmission) EPS
12.08, Issue 6, September 2011, Page 12 [67] presents the following range of asset lives
for the relay types studied. The current policy defined reliable life time, based on
manufactures’ information and operational experience, indicates the anticipated lifetime
of the SHNBs and THRs are 25 years, with a replace window between 20 and 35 years.
Whilst, the LFCB relays have a shorter anticipated life time of 20 years with a replace
window from 10 to 25 years.
Table 4-2: Relay Population and Anticipated Lifetime [67]
Relay
Type
Quantity Installed
Anticipated
Lifetime
Replacement
Window
1999 2005 2011 2014 Earliest Latest
SHNB 357 318 229 156 1981-2001 25 20 35
THR 390 369 231 184 1979-2002 25 20 35
LFCB 117 245 228 213 1991-2004 20 10 25
Figure 4-3: UK National Grid Relay Age Distribution (by the end of 2014)
The age profile provided by the National Grid, shows the age distribution of each
studied relay type by the end of 2014, as illustrated in Figure 4-2. If the life extension
cannot be justified, 29.5% of the SHNB (46 units), 48.4% of the THR (89 units) and
30.0% of the LFCB (64 units) need to be replaced in the next five years (by the end of
2019). This will lead to unmanageable waves of replacement, which requires significant
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 106
circuit outages, equipment purchase and installation costs and human resource
requirements.
4.2.3. Relay Defect Data Analysis
The National Grid Protection Performance Information (PPI) reports from 2000 to 2013
[68] were reviewed to collect maloperation information for each relay type. A total
number of 30 SHNB, THR or LFCB related maloperations were recorded during the
reporting period. Among them, the cause of 7 maloperations cannot be identified. The
maloperations caused by relay hardware failures were extracted and mainly studied.
Table 4-3: Maloperations for each Relay Type from 2000-2013
Relay Type SHNB THR LFCB Unknown TOTAL
All-types of failures 2 13 8 7 30
Relay Hardware Failures 0 7 2 - 9
Table 4-4: Causes of Relay Maloperations
Relay Type Fault Type Causes No
SHNB Security-based
Misoperation
Application failure: wrong Zone2 setting 1
VT fuse failure (external instrument
transformer problem) 1
THR
Security-based
Misoperation
Power supply failure 5
Card failure: comparator card 1
VT fuse failure or Zone 2 card failure 1
VT fuse failure 1
Primary fault: lightning damages relay 1
Application failure: wrong settings 1
Unknown 2
Dependability-based
Misoperation Unknown 1
LFCB
Security-based
Misoperation
Faulty relay card 1
Comms & Processor cards 1
Unknown 4
Dependability-based
Misoperation Unknown 2
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 107
It can be seen that SHNB has the best performance in terms of reliability with only 2
maloperations in 14 years, whilst THR has the worst performance with 13
maloperations. For the two SHNB maloperations, one is caused by Zone 2 setting
failure which is an application failure rather than a relay hardware failure. The other is
caused by VT fuse failure, which is because the VT monitoring function of the SHNB is
manually blocked. Therefore, there are no in-service maloperations recorded in the PPI
attributable to an SHNB hardware failure or defect. Accordingly, no statistical evidence
can be provided to identify any vulnerable components or modules in SHNB relays in
service today.
A total number of 13 THR in-service maloperations were tracked in the National Grid
PPI report and 7 of them were attributable to a hardware failure or defect. 5 out of 7
hardware failures were due to the failure of a capacitor in the power supply module,
which was the most critical unit for the THR life extension. The failure of the power
supply capacitor causes unwanted tripping rather than inability to trip, which is a design
characteristic of the THR. These problematic capacitors have already been replaced in
the power supply module of all the THR units in service, in accordance with the
Equipment Modification Instruction (EMI) 997 replacement procedure and program.
EMI 997 capacitor replacements were performed in the National Grid Light Current
Repair Centre (LCRC) via module rotations, rather than replacing them in the field. The
other two THR failures were caused by random module component failure with no
emerging pattern.
Among eight LFCB in-service maloperations as recorded in the PPI, only two of them
were attributed to relay module failures, and none have occurred recently. Furthermore,
National Grid Transmission Design Circular (TDC) 869 documents an LFCB voltage
regulator integrated circuit failure vulnerability, which has already been addressed by
replacement of all the problem regulators.
4.2.4. Environment Influence
The operating environment is another factor that significantly affects the ageing process
of protection and control devices. An appraisal of the environmental temperatures of
components during operation was performed based on the ambient temperatures
collected over a three year period (Mar 2000 to Mar 2003). Temperature loggers
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 108
designed to record hourly temperatures were placed close to batteries at 62 substations
categorised under three regions: Leeds, Birmingham Weather Station and Heathrow
Weather Station. This is summarised below in Error! Reference source not found.5.
The ambient temperatures at the substations at Birmingham area was the highest with
74% of the time above 20°C. However, for 80% of the time throughout a year, the
ambient temperature at Heathrow weather station area stays below 20°C.
Although the data shows the overall maximum and minimum temperatures; from the
data available, it is impossible to assess the diurnal fluctuations in ambient temperatures
for each sampling location. Thus a proper evaluation of this effect on component and
relay reliability cannot be made. Notwithstanding this obstacle, the maximum and
minimum temperatures and temperature variations recorded are adjudged to not
disproportionately affect component reliability and lifetime as these fall well within
acceptable ambient temperature operating limits of the individual components.
Table 4-5: Summary of Ambient Temperatures Recorded over a Period of One Year
Area Smallest fluctuation
(ΔT) / K
Largest fluctuation
(ΔT) / K
Lowest
Temperature
/ °C
Highest
Temperature
/ °C
Leeds 9 (CHTE 275 Battery
Room Amb) 28 (Staythorpe) +4 +36
Birmingham
Weather
Station
3 (Willington 275
Battery Rm Amb)
28 (Enderby Room
ambient; Seabank
outdoor ambient)
+5 +42
Heathrow
Weather
Station Area
4 (Leatherhead
JFHTC Room power
equipt)
27 (BRWE1, Diesel
House Ambient) +1 +30
4.3. Laboratory Evaluation Results on Selected Relays
In this chapter, the previously described Asset Life Extension (ALE) processes are
performed on selected relay types. The laboratory evaluation results are then analysed
and interpreted to evaluate the condition of the protection device and to issue life
extension processes and action plans. The source and commission history of the
evaluated relay samples are shown in Table 4-6. For each studied relay types, samples
with different in service time are removed from the system and used for the laboratory
study.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 109
Table 4-6: Relay Samples used for Laboratory Testing
Type Serial
No.
History
Original Location Commissioned Replaced Service Age
SHNB
101
002838
P
Ex Whitson 275kVCardiff East-
Uskmouth Circuit FPFM
1986
(estimated) 2011 26
SHNB
102
784167
D
Ex Upper boat 275 S.Stn -
Cilfynydd 2 Circuit FPSM 1993 2005 13
THR 97434/
1
Ex Berkswell 275kV S.Stn -
Feckenham Circuit FPSM
1980
(estimated) 2012 33
LFCB
103
208284
J
Ex Creyke Beck 400kV Keadby-
Killingholme circuit FPFM 1998 2006 9
LFCB
103
547373
C
Ex Greystones B 275 S.Stn –
Lackenby 3 Circuit FPFM 1991 2006 16
The evaluation encompassed tests on two different versions of the SHNB MICROMHO
static distance relays (SHNB 101 (silver) and SHNB 102 (black)). As shown in Table 4-
6, a heavily used SHNB101 relay sample with an in-service time of 25 years from 1986
to 2011, and an SHNB 102 relay with a shorter in-service time of 12 years were used
for the characterisation of operational behaviour, thermal imaging and in-depth
component study. The THR relay used for the life extension study was commissioned in
1980 and had an in-service time of approximately 33 years. According to the National
Grid’s current replacement policy on the THR relay, the anticipated life expectancy is
25 years. Therefore, by the operational performance and the component conditions of
the relay are checked to identify if there is any signs of ageing related degradations.
Two LFCB 103 differential relay samples with different in-service time were tested.
The heavily used relay sample has an in-service time of 16 years from 1991 to 2006.
The lightly used LFCB sample has a shorter in-service time of 9 years.
4.3.1. Laboratory Inspection
The conditions of the relay samples removed from service are first examined.
Disassembled modules of each relay type are visually inspected to check for cracks,
loose or damaged interconnections, heat damage, and signs of corrosions or
contamination. An overview of the overall product, modules, construction components
and technologies for each relay type is also provided.
4.3.1.1. SHNB Visual Inspection Results
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 110
The SHNB relay consists of 32 modules, which are made up of printed circuit boards
(PCBs) with discrete components including transistors, voltage regulators, logic array
integrated circuits, operational amplifiers, variable resistors, reed relays, diodes, wire-
wound high power resistors, miscellaneous film resistors, and electrolytic and film
capacitors. An example of the SHNB relay and its zone comparator module PCB is
illustrated in Figure 4-4. The modules are retractable, making it convenient for thermal
imaging, during which extender cards were used to draw out the modules during a short
period of operation (from a “cold” start).
Several components were found to differ between the two relays in terms of the
packaging design. The ‘older’ SHNB101 design in particular, contained a number of
obsolete components and in some cases it was not possible to obtain datasheets for
these. However, both SHNB relay types are identical in their construction and have
nearly identical component layouts on equivalent boards. Overall, all the SHNB relays
examined relay had a generally good appearance with few scratches on the exterior. The
PCBs within each module appeared in good condition, the protective conformal coating
on the boards had a good level of sheen with no obvious coating or component
discoloration. No obvious cracks, loose or damaged interconnections have been
identified. There were no obvious signs of heat damage or signs of corrosion even on
the components which experience above-ambient operation temperatures. Thus there
were no obvious targets for further evaluation from the visual inspection.
Figure 4-4: SHNB Relay (left) and Its Comparator Module PCB (right)
The wire-wrap technology as used by SHNBs to interconnect circuit modules, without
the need for soldering wires or fabricating backplanes with connectors, was popular for
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 111
the manufacture of electronic equipment in the 1970s and 1980s. It provides more
reliable construction compared to other interconnection methods; the insulation on the
connecting wires is penetrated by the sharp corners of the wrapping posts under
pressure to yield 20 to 40 airtight high-pressure metal contact points in parallel. The
connections are less likely to fail due to vibration or physical stress. Because of the lack
of soldering, solder-related problems are avoided, i.e. corrosion, cold joints and dry
joints that become intermittent. Positive industry experience with wire wrap assembly is
aligned with the observation that National Grid experienced no wire wrap failures in
any of their SHNB relays.
4.3.1.2. THR Visual Inspection Results
The type THR is a multi-zone distance relay manufactured by NEI Reyrolle in Hebburn
(now part of Siemens) and installed in National Grid substations between 1980 and
1990. THR is based on early-1970s discrete transistor circuit design. The visual
inspection on the heavily used THR relay with 33 years in-service time presents a
generally aged and tired appearance (i.e. scratches on the exterior, distorted and
threaded screws and washers, and dust). There is some discoloration of the protective
lacquer on the PCBs which appears uneven in some parts, and may be due to the lacquer
having been applied manually. No obvious cracks, loose or damaged interconnections
have been identified on the boards, or obvious signs of failure in the conformal coating
(lacquer) is evident. There are signs of heat damage on a few resistors.
Figure 4-5: THR Relay (left) and Its Internal PCBs (right)
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 112
4.3.1.3. LFCB Visual Inspection Results
LFCB is a transmission line current-differential relaying system based on
microprocessor technologies with analogue to digital converters. These were
manufactured by GEC, Alstom, or Areva in Stafford and supplied to National Grid
between 1993 and 2005. Both LFCB relay samples present a good appearance. The
protective lacquer on the disassembled boards is in good condition. The boards
appeared in good visual condition without any obvious cracks, loose or damaged
interconnections, heat damage, signs of corrosion or whisker growth.
Figure 4-6: LFCB Relay (left) and Its Internal PCBs (right)
4.3.2. Fingerprint Performance Testing
In this section, the operational behaviours of the protective relays are tested to detect
whether there is any degradation in the relay function as compared to design
specifications. The operating time and the reach accuracy of each relay type are
examined. The operating characteristics are then compared with the contemporary
replacement relay types to determine if new relays would offer a meaningful
performance improvement that could influence the replacement decision process.
4.3.2.1. Fingerprint Testing Methodologies
An Omicron CMC 256 test set was used to test the operational performance of the
distance and differential relays. The settings of each relay type and the parameters of the
protected circuits in the 400 kV transmission systems were provided by National Grid
as shown in Appendix A. The methodologies used to test each relay type are described
as follows:
1) Distance Relay (SHNB, THR) Testing Method:
Operational performance of the distance relays are tested via two different approaches,
namely the static and the dynamic fault based testing. For static fault based testing, the
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 113
Omicron ‘Distance Relay’ test module is used to simulate different types of fault (i.e.
phase to ground fault, phase to phase fault and three phase fault). All these faults are
automatically simulated by the test set with a constant fault current of 2A and are then
injected into the relays. The relay reach can be measured by inserting test points along
the relay characteristic angle and at edges of each operation zone as shown in Figure 4-7;
the green dots are the inserted test points designed to evaluate the reach accuracy,
sensitivity and operating time of relays with a Mho characteristic. This method is used
by National Grid for their routine tests.
The limitation of static testing is that it uses a fixed injected current and a stable voltage
to test the relay. Consequently, it cannot reflect the actual waveform of the voltage and
current signals seen during a fault. Therefore, a second approach is also used. As shown
in Appendix A, the PSCAD simulator was used to simulate the double circuit
transmission system using the line parameters provided by National Grid. Different
types of fault are inserted at different positions along the transmission line. The
transient fault current and voltage signals at the relay location are recorded and saved in
a COMTRADE file and then replayed by the Omicron test set and injected into the
distance relay to evaluate its performance.
Figure 4-7: Static Fault based Distance Relay Testing in Omicron ‘Distance Relay’
Module
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 114
2) Differential Relay (LFCB) Testing Method:
Figure 4-8 illustrates a dual-slope biased restraint characteristic of the LFCB current
differential relay. For a two-ended line with end A and B, IA-a and IB-a are the time
aligned a-phase current vector signals at ends A and B at a particular time. The
differential and bias current values can be calculated as:
( )diff a A a B aI I I (4-2)
- 1/ 2( )bias a A a B aI I I (4-3)
The protection characteristic is determined by four settings: the basic differential current
setting determines the minimum pick-up level of relay IS1, the lower percentage bias
setting k1, the bias current threshold setting IS2 and the higher percentage bias setting k2.
The tripping criteria can be formulated as:
For 2bias SI I , 1 1diff bias SI k I I (4-4)
For 2bias SI I , 2 2 1 2 1( )diff bias S SI k I k k I I (4-5)
Figure 4-8: LFCB Dual Slope Bias Characteristics
The loop back commissioning test is used to test the performance of the differential
protection function of the LFCB relay. The MITZ 03, which is a stand-alone fibre-optic
to electrical communications interface unit, is switched to the ‘X.21 Loopback’ option
to allow loopback of the X.21 communication signals for relay testing.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 115
Figure 4-9: Connections for LFCB Bias Charateristic Testing
As indicated in Figure 4-9, the loop-back connection feature on the LFCB relay allows
the bias characteristic to be detected by injecting a bias current into one phase of the
relay and a differential current into another phase. The relay uses the higher of the two
input currents as the bias current. By slowly increasing the current in the other phase
until the associated phase contact operate, the threshold differential current at this point
can be found. The method is applied to check the dual slope characteristic of each phase
and is sufficient to fully check the functionality of each module in the LFCB. The
percentage bias settings when the current is below IS2 and when it is above IS2 are tested
respectively. The minimum operating current and the minimum operating time of each
phase are also recorded. Detailed settings of the LFCB relay can be found in Appendix
A.
4.3.2.2. Fingerprint Testing and Comparison with Contemporary Replacement
Relays
The described fingerprint testing process is performed on the selected relay samples.
The operating characteristics are then compared with a contemporary replacement relay
(i.e. Alstom P545) to determine if modern relays offer a performance improvement in
terms of operational speed and accuracy that could influence the replacement decision.
1) SHNB Testing Results:
The operational behaviours of the heavily-used SHNB 101 (i.e. 26 years in-service
time), a lightly-used SHNB 102 (i.e. 13 years in-service time) and a modern Alstom
P545 relay were tested using both static and dynamic fault based tests. The SHNB relay
has three comparator modules for three protection zones. Each comparator module
contains 6 comparators for 6 different phase to phase faults (i.e. A-B, A-C and B-C) and
phase to ground faults (i.e. A-E, B-E and C-E). To fully test the relay function, the
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 116
protection performances for each protection zone and each fault type need to be
evaluated. The reach accuracy and operation time of Zone 1, Zone 2, Zone 3 and Zone 3
offset are shown in the following Table 4-7 and Table 4-8.
The testing results verify that all the relays are operating as designed under both static
and dynamic fault testing. The reach and operating time of each zone under different
types of faults are accurate. No significant signs of degradation in relay functions can be
identified, even on the relay with an in-service longer than its anticipated lifetime. The
tested relays show similar operational behaviours compared with the contemporary
replacement Alstom P545 relays in terms of reach accuracy and operating speed.
Table 4-7: Fingerprint Testing Results for Static Faults
SHNB 101
(26 years)
SHNB 102
(13 years) Alstom P545
A-E Fault A-E Fault A-E Fault
Zone Reach Op Time Reach Op Time Reach Op Time
Zone 1 80% 15.3 ms 79% 14.6 ms 78% 18.6 ms
Zone 2 150% 517.7 ms 148% 515.2 ms 148% 517 ms
Zone 3 199% 1030 ms 199% 1027 ms 197% 1017 ms
Zone 3 Offset -16% 1024 ms -16% 1023 ms -16% 1015 ms
A-B Fault A-B Fault A-B Fault
Zone Reach Op Time Reach Op Time Reach Op Time
Zone 1 81% 12.3 ms 80% 13.1 ms 79% 17.4 ms
Zone 2 153% 510.9 ms 151% 512.3 ms 150% 517 ms
Zone 3 202% 1017 ms 202% 1020 ms 199% 1015 ms
Zone 3 Offset -16% 1023 ms -16% 1020 ms -16% 1021
Table 4-8: Fingerprint Testing Results for Dynamic Faults
SHNB 101
(26 years)
SHNB 102
(13 years) Alstom P545
A-E Fault A-E Fault A-E Fault
Zone Reach Op Time Reach Op Time Reach Op Time
Zone 1 82% 13.1ms 81% 14.9 ms 79% 16.7 ms
Zone 2 149% 517.7 ms 148% 520.3 ms 150% 512.3 ms
Zone 3 212% 1024 ms 212% 1021 ms 211% 1016 ms
Zone 3 Offset -12% 1030 ms -10% 1036 ms -15% 1022 ms
A-B Fault A-B Fault A-B Fault
Zone Reach Op Time Reach Op Time Reach Op Time
Zone 1 84% 16.7ms 84% 14.9 ms 80% 16.7 ms
Zone 2 154% 515.9 ms 153% 511.3 ms 154% 512.3 ms
Zone 3 205% 1054 ms 204% 1045 ms 205% 1016 ms
Zone 3 Offset no trip - no trip - no trip -
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 117
It is worth noting that the testing result for Zone 3 offset reach is significant different
between static and dynamic based testing. With a fixed injected current and a stable
voltage, the static testing is not sufficient to fully test the “polarising” function of the
protection device. The SHNB relay is using the “partial cross polarisation” signal to
provide directional reference for the relay comparators. During a single phase to ground
fault (e.g. A-E fault), the phase of the faulty phase voltage (VA) can be represented by
the sum of the other two healthy phases (i.e. “VB+VC”). For a two-phase or three-phase
close-in fault, the memory voltage signals (i.e. VMA, VMB and VMC) obtained during
healthy live line conditions are used as polarising signals. With an 11-cycle memory
length (220 ms for 50 Hz system), the memory polarising signal is sufficient for Zone 1
protection to clear the close-in faults. However, with Zone 3 time set to 1000 ms, the
Zone 3 offset protection would block the operation when the memory polarising signal
times out. Consequently, during dynamic testing, no trip signal was detected when
testing the relay Zone 3 offset reach.
The testing results indicate that the SHNB relay samples offer equal protection
performance compared with modern relay. It shows that the static fault based testing,
which is normally used by UK National Grid for routine test, may not be sufficient to
fully test the protection function (e.g. Polarising module).
2) THR Testing Results:
The operational performance of a heavily-used THR relay with an in-service time of 33
years is compared with modern numerical Alstom P545 distance relay. The tested relay
show similar operational behaviours compared with the modern numerical relay with no
signs of degradation in operational function. The reach and operating time of each zone
under different types of faults are accurate and as defined. Replacing the THR with
modern equipment is not likely to offer performance improvement.
3) LFCB Testing Results:
The X.21 loopback testing was carried out to test the operational behaviours of two
LFCB relay samples with different in-service time. Based on the test results, both the
heavily-used and the lightly-used LFCB samples are operating as designed. The
percentage bias settings k1 and k2 for each operational phase are proved to be accurate
compared with expectation (i.e. k1=30%, k2=150%). Fast operating times of the LFCBs
ensure that the relay could detect and trip the fault in a timely manner, with the
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 118
operational time for all the three phases within 25 ms. No significant signs of
degradation can be identified in terms of the operational performance.
Table 4-9: LFCB 103 (208284J) (9 years in-service time) Testing Results
Calculated K1 Calculated K2 Min Op. Level Min Op. Time
A Phase
Tests
B Ph C Ph B Ph C Ph 0.123 A 22.6 ms
28.7% 29.6% 148.9% 149.4%
B Phase
Tests
A Ph C Ph A Ph C Ph 0.126 A 22.9 ms
29.4% 30.1% 151.1% 149.8%
C Phase
Tests
A Ph B Ph A Ph B Ph 0.121 A 22.6 ms
30.4% 31.5% 150.7% 151.1%
Table 4-10: LFCB 103 (547373C) (16 years in-service time) Testing Results
Calculated K1 Calculated K2 Min Op. Level Min Op. Time
A Phase
Tests
B Ph C Ph B Ph C Ph 0.116 A 21.3ms
29.4% 29.4% 150.6% 150.0%
B Phase
Tests
A Ph C Ph A Ph C Ph 0.115 A 21.2 ms
31.8% 31.8% 150.0% 150.0%
C Phase
Tests
A Ph B Ph A Ph B Ph 0.112 A 20.3 ms
30.6% 30.6% 150.6% 150.6%
The operational performance of LFCBs is next compared with the modern numerical
Alstom P545 relay in the differential protection characteristic. The results indicated in
Table 4-11 verify that the LFCB relay can provide designed protection function with
similar accuracy compared with the modern numerical differential relay. Additionally,
operational time provided by LFCBs when detecting a fault is similar to the modern
numerical relays. Although the operation speed of LFCB is slightly faster than P545, the
difference is not important as long as timely operation can be provided. Since the
protection performance of the LFCB is identical with its contemporary replacement,
replacing the LFCB with modern equipment is not likely to offer performance
improvement.
Table 4-11: Alstom P545 Differential Characteristic Testing Results
Calculated K1 Calculated K2 Min Op. Level Min Op. Time
A Phase
Tests
B Ph C Ph B Ph C Ph 0.12 A 25.9 ms
30.1% 30.1% 149.1% 149.8%
B Phase
Tests
A Ph C Ph A Ph C Ph 0.12 A 26.5 ms
30.1% 30.1% 151.2% 150.8%
C Phase
Tests
A Ph B Ph A Ph B Ph 0.12 A 26.8 ms
30.1% 30.4% 149.9% 149.2%
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 119
4.3.2.3. Voltage Transformer Supervision Function Testing for Distance Protection
According to the PPI report, one of the two recorded SHNB maloperations was caused
by VT fuse failure. During a VT fuse failure situation, the distance protection would
measure zero voltage for one or more of the three-phase voltages, resulting in erroneous
trips. Consequently, the distance protections must detect the voltage drop caused by a
short circuit or open circuit in the VT circuit to prevent unwanted tripping.
Consequently, the voltage transformer supervision (VTS) function of the SHNB is
tested to check whether there is any degradation in this function which caused the
maloperations.
The relay measures the negative phase sequence (NPS) components of the line voltage
and current signals to detect a voltage failure. During normal system conditions, both
NPS voltage and current levels are below the NPS thresholds. When there is an
unbalanced fault at the primary transmission system, both NPS voltage and current
signals will be above the threshold. When there is a loss of a phase voltage due to a VT
fuse failure, there will be negative sequence voltage, however, the current will be nearly
balanced so there will be no negative sequence current. The SHNB VTS function
therefore operates on detection of negative sequence voltage without negative current to
block the tripping when there is a VT fuse failure.
The voltage waveform during a VT fuse failure was simulated using the Omicron
software by suppressing the voltage of one or more phases to zero at VT fuse failure
time. With the SHNB relay VTS inhibition function enabled, the VTS function
behaviour can be observed as:
When a single phase voltage (or two phase voltages) as measured by the relay
becomes zero, the relay VTS module detects the voltage failure, blocks relay
tripping and provides an alarm.
When there is a three phase simultaneous voltage failure, the VTS module does not
respond, because the fault doesn’t produce a negative sequence voltage. This is not
a practical disadvantage because of the extremely low probability of such a failure
(i.e. three VT fuses blowing simultaneously).
It is therefore reasonable to conclude that the two SHNB distance relays under testing
could detect the loss of voltage situations and block the tripping. The maloperation
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 120
caused by VT fuse failure was due to an application problem, the VTS function of the
particular National Grid SHNB relay was disabled.
4.3.3. Stress Testing-Simulated In-service Conditions
The stress that could potentially have an impact on relay life under its normal operating
conditions was also evaluated. The following tests are performed:
1) Input CT thermal stress test
2) Power supply component voltage stress test
3) Auxiliary energizing quantities Stress test
4.3.3.1. Input CT Thermal Stress Test
Any increase in the system current level seen during normal load and fault conditions
might result in continuous thermal stress in input isolating CTs. Consequently, the
maximum fault and load current in the system as well as the relay CT rated overload
capabilities are examined to check whether the thermal stress resulting from these
currents will damage the relay CTs. The maximum fault current on the 275 kV and the
400 kV systems is acquired based on system measurements:
400 kV Network: CT Ratio: 2000:1; Maximum Fault Level: 63 kA;
Maximum Secondary Current: 31.5 A
275 kV Network: CT Ratio: 1200:1; Maximum Fault Level: 40 kA;
Maximum Secondary Current: 33.3 A
The National Grid Technical Guidance Note (TGN) [69], which specifies the ratings of
overload capabilities under heavy load system conditions, is reviewed as shown in
Table 4-12.
Table 4-12: Ratings and Assessed Overload Capabilities of Protective Relays
Protection
Type
Rated
Current (IR)
Max
Continuous
Capability
Initial
Load
Short-Term Overload Capabilities
2 Min 3 Min 5 Min 10 Min 20 Min
SHNB 1 A 3×IR 3 A 6 A 5 A 4 A 3.5 A 3 A
SHNB 5 A 3×IR 15 A 15 A 15 A 15 A 15 A 15 A
THR 1,2 & 5 A 2.2×IR 2.2×IR 3.22×IR 2.6×IR 2.2×IR 2.2×IR 2.2×IR
LFCB 1 A 4×IR 2 A 6 A 5 A 4 A 4 A 4 A
The maximum fault current and overload currents in the system are then compared with
the thermal rating of the relay CTs, as specified in the relay manuals:
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 121
SHNB:
AC Input Current Rating: IR=1A
(3×In continuously), (57.7×In for 3s), (100×In for 1s).
THR:
AC Current Input Rating: IR=2A
Maximum Continuous Current: 2.2×nominal rating
Short time Current Rating (2 secs): 50×nominal rating, 25×nominal for maximum
course setting.
LFCB:
AC Input Current Rating: IR=1A
(4×In continuously), (100×In of 400A for 1s).
During a system fault, the maximum fault current on the relay secondary side can reach
33.3A. For all the three relay types, the isolating CTs can be subjected to this maximum
fault current for more than 1 second, which is long enough for a circuit breaker to clear
the fault, even if the fault is cleared in back-up protection operating times. Consequently,
the isolating CTs will not be damaged by the fault current.
During a heavy load condition, with respect to the load encroachment data, the
maximum system loading for protection setting consideration is 5360 A for 400 kV and
4490 A for 275 kV. Consequently, the maximum load current is 2.7A on secondary of
400kV CT and 3.7A on secondary of 275kV CT. According to TGN requirements, the
situation when the load current exceeds three times the rated current should last no more
than 2 minutes, which ensures none of the relay CT will be thermally damaged during a
heavy load condition. Conclusions can be drawn that the forecast load and fault current
won’t cause thermal damage on relay insulating CTs. Therefore, the end of life of the
relay CTs are unlikely to be accelerated by thermal stress.
4.3.3.2. Component Voltage Stress Test
The voltage stress testing is performed to identify any components which might have an
operating voltage approaching its rating limit. Since all the components are designed to
work under their voltage rating, tests are only limited to the components that are
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 122
subjected to voltages that are higher than analogue or digital processing circuit supply
voltages, typically the power supply components. A voltage meter is used to measure
the voltage across the components when the relay is powered up. Table 4-13 shows the
voltage stress testing results on the THR PS10 power supply unit.
Based on the test results on all the tested components of the three relay types, the
voltage subjected to the component is well within the component voltage rating. This
indicates that components with high quality standard were used during the design of the
relay. No component has voltage stress exceeding or approaching its rating.
Consequently, component degradation is not likely to be accelerated by voltage stress.
Table 4-13: THR PS10 Power Supply Unit Components and Voltage Stress
Code Product Code/description
Voltage
Rating (V)
Operating
Voltage (V)
Output
Regulator
Board
C5 Electrolytic capacitor 63V 14.63
C6 Electrolytic capacitor 63V 14.71
D9 Small signal diode
0
R2 W22 R33 Vitreous enameled
wire wound resistor 84V 0.27
R8 Axial lead polymer film resistor, 3.3kΩ 150V 0.016
R12 Axial lead polymer film resistor, 470Ω 150V 0.084
R17 Axial lead polymer film resistor, 330Ω 150V 0.05
R18 Axial lead polymer film resistor, 220Ω 150V 0.38
R21 Axial lead polymer film resistor, 1kΩ 150V 0.03
Power Supply
(EMI:997,
Date:
16/12/2011
LCRC R6P
B/C:00013006)
T5 2N2222A 307
TO-39 type Si
Planar Epitaxial
NPN high speed
switch, metal can
package
Collector-
base max
voltage: 75V
29.5
R21 Axial lead polymer film resistor, 4.7kΩ 150V 20.13
R25 Axial lead polymer film resistor, 3.3 kΩ 150V 21.7
4.3.4. In-Depth Evaluation of Modules and Components
In this section, the components on energized relay modules are characterized using
thermal imaging and non-destructive structural evaluation techniques, designed to
identify any potential life-limiting conditions. The identified hot components are
compared with rated capabilities and with electronic product industry experience related
to the reliability experience with different levels of heating. Components requiring
further investigation based on thermal imaging or any other observations are examined
using three-dimensional x-ray tomographic micro-imaging. These imaging results show
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 123
any signs of degradation or wear-out, leading to determination of whether stressed
components are still sound, and whether a specific life extension can be forecast. A
system level failure mode, mechanism and effect analysis (FMMEA) is next performed
to determine the function of each studied component in its relay module and in the
overall operation of the protective relay. The purpose is to identify particular modules or
components most likely to cause a severe problem in the fault clearance operation of the
relay.
4.3.4.1. Thermal Characterisation
A Fluke Ti100 9Hz thermal imaging camera was used to perform the thermal
characterisation for all the three relay types on a module by module basis. This
facilitates the identification of components whose temperatures rose above the ambient.
A detailed module-by-module analysis for each relay type can be found in the
individual testing reports [70-72]. The in-depth component evaluation of the LFCB
relay was illustrated in this section.
Figure 4-10 and Figure 4-11 show an LFCB disassembled for thermal imaging.
Components that were identified to operate above the ambient temperature and
components close to the hot spots are considered to be most vulnerable to the
degradation mechanisms driven by temperature, such as creep and fatigue of device
interconnections within integrated circuits. A brief conclusion of the thermal imaging
observations on each module of the LFCB is described as follows:
Module 1: Power Supply (GM0026013A): A number of elevated temperature zones
(hotspots) were identified in the power supply module during operation (see Figure 4-
10). These include a diode (D34), a voltage regulator (IC47) and three resistors.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 124
Figure 4-10: Thermal Images and Components within LFCB Power Supply Module
Modules 2 &3: Relay Outputs 1 & 2 (GM0032001A): Modules appear identical, and in
both, a single hotspot was identified as a film resistor (R54), which reached about 31°C,
and is shown in Figure 4-11.
Figure 4-11: Thermal Images on Components within Modules 2&3 (Relay Outputs 1&2)
Module 4: Communications controller (GM0052021): This module consists of two
PCBs: a communications controller board and a communications interface board. A
number of components reached above ambient temperatures during operation on both
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 125
boards. These are three voltage regulators (IC23, IC24, IC25), two logic ICs (IC34,
IC1), and a variable resistor (RV1).
Module 5: Microcomputer module (GM024001AZ): This module consists of a single
PCB, and exhibited three components operating at above-ambient temperatures. These
were two voltage regulators (IC28 and IC29) and a logic IC (IC1) reaching 36°C and
31°C respectively.
Module 6 Analogue & status input module (GM0036001A): In Module 6, two
components exhibited above ambient operation. These were a voltage regulator (IC11)
and a PDIP IC, reaching 31°C and 33°C respectively after a few minutes of operation.
Module 7 Current Transformer input module: This module was not imaged. It was
determined in stress testing in Section 4.4.3 that the current transformers are not close to
their rated capacity due to fault currents in the National Grid transmission system.
A list of imaged modules and their components with maximum observed temperature is
shown in Table 4-14. These identified components are operating at a temperature above
that of the surrounding component. However, at less than 36 degrees C, the components
are not thermally stressed or operate at a significant fraction of their power dissipation
capability or operating temperature limit.
Table 4-14: Thermal Imaging of LFCB Relay and Examined Hot Components
Module ID Module/Function Hotspots Component(s)
1 GM0026013A Power supply Yes
Voltage regulator
(IC47, IC9), resistors
x3, diode D34
2 GM0033001A Relay output 1 Yes (31°C) Resistor R54
3 GM0033001A Relay output 2 Yes (31°C) Resistor R54
4 GM0052021A Communications controller Yes (32°C)
Voltage regulator
(IC23, IC24, IC25), IC
34, IC1, resistor RV1;
IC14, R27, R28
(interface board)
5 GM024001AZ Microcomputer module Yes (36°C) IC28, IC29, IC1
6 GM0036001A Analogue & status input module Yes (33°C) IC11
7 Current transformer input
module Not imaged
4.3.4.2. Detailed Structural Investigation via 3D X-ray Microtomography
Based on the findings via thermal imaging, a number of components were identified as
potentially vulnerable to thermal degradation mechanisms. A selection of these
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 126
components was therefore subjected to a detailed structural investigation using 3D X-
ray microtomography, which greatly facilities the non-destructive observation of the
internal structure of engineering materials and structures [73]. This technology enables
cracks and defects to be observed three-dimensionally without destroying the specimen
or compromising the results. X-ray images or projections of a sample are acquired from
a rotating specimen by a stationary detector. These images are reconstructed into a
three-dimensional volume using computer software. Multiple ‘virtual’ cross-sections (or
slices) can be obtained in any plane of interest. The aim of the testing is therefore to
ascertain any existing structural damage or degradation within the components and the
levels of damage thereof. The tomography imaging was performed on an Xradia Zeiss
Versa-XRM500 CT system. A list of identified vulnerable components which require
in-depth component investigation for each relay type is given in Appendix B.
The X-ray tomography study on the LFCB voltage regulator is provided in this section
for illustration purpose. As shown in Figure 4-12, the voltage regulators on the
communication interface board (Module 4) from LFCB relays with 8 and 15 year
service histories were studied and compared.
(a)
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 127
(b)
Figure 4-12: X-Ray Tomography Images of LFCB Voltage Regulator IC14, Module 4:
(a): 8-year old relay; (b): 15-year old relay;
Of the regulators studied, the void area in the die attachment ranged from 1.08% to
9.08%, with no pattern of lifting or separation from the substrate. Wire bonding is sound.
None of the observed imperfections impact the reliability of the regulator in this service.
The origin of the voids cannot be determined – they may have been present when the
regulator was new.
Figure 4-13: X-Ray Tomography Images of Voltage Regulator IC23, 15-year Old Relay
The studies noted that the chip with 9.08% void area came from a relay with 8 years of
service, while Figure 4-13 shows a similar regulator with no visible voids from a relay
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 128
with 15 years of service history. There is no evidence that ageing in service is affecting
the reliability of these regulators. By contrast, Figure 4-14 shows an image of a
regulator from a different piece of equipment (not a protection relay) with bonding
separation that risks failure. Note that this image is photographically the negative of the
images in Figure 4-14 – the dark area is sound and the light area shows separation.
Figure 4-14: Acoustic Microscopy Images showing the Evolution of Degradation in a
TO-220 Package Die Attachment during Thermal Cycling
The detailed structural investigation was undertaken on a number of components in each
relay type as listed in Appendix B. The components are considered more susceptible to
thermally activated degradation mechanisms. Samples of these components, mainly
transistor/IC packages, were extracted from relays with different service life history.
Particular attention was paid to die attachments and wire bonds. Signs of packaging-
related damage, i.e. die attachment voiding and cracking were observed. It is not
possible to say whether the observed damage was present in the as-manufactured
condition, or whether it evolved during operation. Overall, the damage observed in
components was not extensive. Percentage void area beneath die attachments ranged
was always less than 9.08%. Thus, although a gradual degradation in thermal resistance
and electrical performance is expected over time, under the typically benign ambient
environmental conditions and in the absence of significant temperature cycling,
significant acceleration of the observed degradation mechanisms is unlikely. In addition,
no signs of bond wire failure were observed. The detailed funding and experimental
tests for each relay components can be found in the individual testing report [70-72].
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 129
4.3.5. System Level Failure Mode, Mechanism and Effect Analysis
If vulnerable components, or components with ageing related degradation, can be
identified, a detailed FMMEA can be performed to determine the impact of the
component on the overall relay operation. A relay spends the majority of its life in an
energised but quiescent state, where it is monitoring a healthy but live transmission line.
The two most common failure modes of Power System protective relays are:
Failure to trip when required.
Mal-trip when not required.
Other types of failure mode may not affect the clearance of the fault but could affect
other applications:
Correct trip, but incorrect operation of other outputs (e.g. indicator lamps, auto-
reclosing signal).
System level FMMEA was performed to indicate all the relay operational failure modes
that can be caused by each relay module. Functionalities of each printed circuit board
are described for the modules containing multiple PCBs. A detailed description can be
found in the individual report for each relay. This helps National Grid identify the
impact of component failure on relay behaviour and evaluate the risks of extending the
lifetime of the relay. It can be concluded from the FMMEA that no component is likely
to cause an operating problem are also ranked as likely to fail.
4.4. Conclusions and Future Works
Each of the three relay types yielded consistent evaluation results and demonstrated
eligibility for an asset life extension. Based on condition and deterioration observed on
the potential vulnerable components on the relay sample approaching designed lifetime,
an initial extension of five years for each relay type was proposed. The conditions of the
vulnerable components are recommended to be tested again after the five-year extension.
This decision is made since no significant sings of degradation is observed. However, if
the relay component shows significant signs of ageing related degradation, accelerated
lifetime testing (ALT) is recommended to be performed to determine the exact lifetime
can be extended. There is future opportunity for further extension by focused
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 130
rechecking of the most stressed components as documented in the individual detailed
test reports.
Since the tested relay types continue to perform reliably with no increase in failure rates
or component degradation over many years of service, the flat failure-rate trajectory
does not forecast any specific end of asset life. The proposal to extend asset life by five
years comprises a service life extension of only 15% of the time for which the oldest
evaluated unit has already served. The service life extension is further supported by
thorough technical evaluation of any failure that occurs during the extended life interval,
and re-evaluation of the policy change if any unforeseen failure pattern arises.
4.4.1. Recommendations
Table 4-1 indicated the range of asset lives for the relay types studied as defined by the
National Grid Policy Statement EPS 12.08, Issue 6. Since all known early-failure
vulnerabilities have been corrected in relays in service today, the important limits are
the anticipated asset life and the latest onset of significant unreliability. Based on the
evaluation results, the following recommendations were given:
1) Based on results of project evaluation, this report recommends the following 5-year
extensions for all of the three relay types as shown in Table 4-15:
Table 4-15: Recommended Relay Lifetime based on Evaluation Results
Relay
type Equipment type
Anticipated
asset life
(recommend
ed)
Earliest onset of
significant unreliability
(recommended)
Latest onset of
significant
unreliability
(recommended)
SHNB
and THR
Complex electronic
relays (transistorised
or integrated circuits)
30 years 25 years 40 years
LFCB Digital (A/D converter
and microprocessor) 25 years 15 years 30 years
2) For any targeted relay that fails in service during the next five years, a project team
established by National Grid Protection Engineering shall convene to investigate
the failure and report results, including any impact on replacement life policy, and
on conclusions of this set of reports.
3) For each of the three relay types, a National Grid Technical Document should be
developed specifying the evaluation process to be carried out if units in service are
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 131
still performing reliably in 2020. The following steps for one to two aged samples
of a relay type should be included to evaluate any changes from the condition
observed in the present study:
a) Physical inspection.
b) X-ray tomography of specifically targeted ageing components.
c) Fingerprint testing to the baseline established in the respective full test report.
d) See Section 8.10.6 of SHNB Report [70], Section 8.7.6 of THR report [71], or
Section 9.9.7 of LFCB Report [72] for recommendations.
In particular, the X-ray tomography of stressed components already identified
in the present study, performed on one to two relay samples, will show any
new ageing evidence that was not observed in the present work.
4) Complete repetition of the large study documented in full reports [70-72] is not
required on retest.
5) Document the review conclusion for each relay type:
a) Relays are reaching end of service life; revised policy above is appropriate.
b) Relays remain reliable, without impending failures. Propose to further update
replacement policy to extend asset life by an additional five years, with another
recheck in 2025.
6) Consider energizing relay spares periodically to restore the dielectric layer within
electrolytic capacitors.
a) Energize at least annually, and for at least one hour. Observe apparent
operational state via relay panel indications.
b) Assess adequacy of spares inventory for each relay type whose asset life has
been extended.
c) This procedure is valid for all relay types, but is particularly valuable for
ageing relay spares of the types for which this report recommends extension of
service life.
The rated life values tabulated above have all been increased by 5 years for consistency
with the existing policy rating process. It is worth noting here that the earliest-onset life
ratings could be revised to reflect the reliable service history for the significant
populations of each type currently in service. However, this will have no practical
impact on replacement plans, which is driven by anticipated and latest-onset life ratings.
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 132
4.4.2. Further Work and Application to other Equipment Types
The application of the evaluation process to validate asset life extension decision for the
selected relay types has been described in this chapter. In addition, this process is also
effective for use on electronic systems built from components similar to those of SHNB,
THR, and LFCB, which incorporates electronic technologies ranging from transistorised
circuits to microprocessor-based systems with other large-scale integrated circuits and
power semiconductors.
If certain relays (or other critical equipment types) are serving reliably as they approach
policy end-of life, they are potential candidates for this evaluation and for possible asset
life extension. With experience gained in the present study, it is possible to conduct
these studies in the most efficient manner. A history of reliable service for a large
population of products can be used as a good pre-filter for choosing products to study.
This report recommends that results for the studied relays must not be broadly applied
to other similar devices, unless the electronic design employs the same components or
hardware platform. To the extent that another product shares some of the same design,
only the elements that are different (and the overall fingerprint test performance) need
be evaluated. Otherwise, a full study should be carried out. It is also possible that
product study may reveal unforeseen risks by showing design weaknesses, degradation,
or impending failures that could not be observed by visual inspection or by normal
functional behaviour – hence such products may not deliver anticipated asset life. While
this is not the result that an asset life extension study would hope to establish, it is
valuable to know.
4.5. Summary
The protection and control asset end-of-life analysis carried out in this chapter
effectively evaluated the operational condition of some of the commonly used electronic
and early numerical protective relay types in the National Grid 400 kV and 275 kV
transmission networks. The protection maloperation record indicates that all the selected
relay types are serving in a highly reliable manner, with very few maloperations
attributed to relay hardware failures. Accordingly, all the devices are still in their useful
life time with random failures, and no statistical evidence of vulnerable components or
Chapter 4: Protection and Control Asset End-of-life Analysis
Page | 133
modules could be identified. An evaluation process combining statistical analysis and
sample testing is developed to identify the life limiting factors and used to estimate the
reliable service lifetime for each relay type.
Operational behaviour of the studied relays was tested and compared with modern
numerical relays by performing static and dynamic fault based fingerprint testing. No
degradation in protection function can be identified, and all the studied relays offer
equal performance in operational speed and accuracy for their intended functions as
compared to modern relay types. Stress testing was performed to identify components
which operate under voltage, thermal or current stress during relay normal operation
state. It has been proved that the increase in system fault current level and heavy
overload current level won’t cause additional thermal stress for the selected relay input
CTs. Components that are identified to operate above the ambient temperature are
considered to be most vulnerable to the degradation mechanism driven by temperature,
such as creep and fatigue of device interconnections within integrated circuits. Samples
of these components, mainly transistor/IC packages, were extracted from relays with
different service life history for 3D X-ray microtomography in-depth study. The non-
destructive structural evaluation techniques indicate that no signs of degradation or
wear-out can be found on the studied relay types. Each of the three relay types yielded
consistent evaluation results and has demonstrated eligibility for an asset life extension.
Based on condition and deterioration observed to date an initial extension of five years
for each relay type is proposed. In addition, effective maintenance and condition
monitoring strategies are recommended to National Grid to ensure effective
maintenance on these protection assets and provide timely updates if there is any new
ageing evidence.
The critical function of the protection devices in initiating the protection scheme and
preserving system reliability has been discussed in the previous chapters. The
evaluation process proposed provides an effective method for utilities to assess the
condition of the protection devices to ensure their reliable operation and an optimal
replacement plan.
Page | 134
CHAPTER 5
RISK ASSESSMENT OF A SYSTEM
INTEGRITY PROTECTION SCHEME
5.1. Literature Review of SIPS Reliability Assessment Method
As described in the previous chapter, the increased application of SIPS and high
financial penalty of SIPS maloperation make it necessary to assess the SIPS reliability
on a regular basis. In addition, the introduction of modern ICT devices and IEDs, and
the application of new communication protocols in SIPS design make its reliability and
performance a great concern. All these changes in SIPS call for a method which
effectively assess the possible risks induced by SIPS operation. The previously
developed SIPS reliability criteria were reviewed in Chapter 2. These could be used as
reliability standards to evaluate the performance of SIPS and various reliability
enhancement methods. In this section, the risk assessment methods proposed in the
previous literatures are reviewed and their effectiveness in the modern electrical
network is analysed.
A generic procedure for risk-based assessment was firstly developed by Fu et al. in [74]
to determine the arming strategy of a SIPS. The optimal arming point was determined
by comparing the operational risk of the test system with and without SIPS. The
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 135
reliability assessment method is based on a combination of Markov Modelling and
failure mode and effect analysis (FMEA). It is argued that Markov Modelling is well
suited for SIPS reliability assessment because of its flexibility to account for various
common features and operational states in SIPS. Specifically, it could incorporate
independent and common cause failures, partial and full repairs, maintenance and
diagnostic coverage. The reliability assessment method described in reference [74] is
applied and developed by Esmaili [75] to assess the risk of SIPS based on the online
measurements and determine the optimal data window for SIPS performance prediction.
The optimal starting time and width of the data window used to predict its performance
is evaluated to provide the best in time actions to keep the stability of a continuously
changing Power System.
Besides Markov Modelling, the reliability block diagram (RBD) and fault tree analysis
(FTA) are often adopted in industries to quantify the risk of SIPS [25, 76]. Reference
[25] quantified the probability of SIPS dependability and security based maloperations
using FTA, which was then applied to the Dinorwig intertrip scheme in North Wales.
Two reliability indices (i.e. SIL and STL) were enforced as reliability criteria to
quantify the level of SIPS maloperations. The assessment results shown FTA is a
flexible method to model complex protection scheme structures and handle data
uncertainties. Human operator errors and software problems preventing the scheme
being armed when required are considered in the risk evaluation.
The risk assessment models proposed by Miguel [77, 78] considered the impact of
uncertainty in the protection system, communication system and the correlation
coefficients between generation and demand by generating random samples using
normal distribution. Trade-off between capability costs, operational benefits, and the
risks associated with SIPS operation is focused. The research in [79] used con-resistant
trust to quickly identify the maloperation of the protection scheme to mitigate system
instability. The con-resistant trust mechanism allows SIPS to assess the cooperative and
defective behaviours of different load points based on the periodical report. This helps
the decision making in the load shedding to ensure the stability in system frequency. It
proved that a load shedding protection scheme with the con-resistant trust mechanism
was able to keep the steady-state frequency above the threshold with a high degree of
uncertainty in a Power System. In this assessment, historical data are used to predict the
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 136
SIPS operation risk in the future. This significantly improves the accuracy and
effectiveness of the risk assessment model.
It has been proved that the reliability block diagrams, Markov Modelling and Fault tree
analysis are suitable methods to get an overview of SIPS reliability. However, the
deployment of modern ICT significantly increases the complexity in SIPS structure.
The impact of the digital communication system on SIPS needs to be reflected in the
reliability assessment procedures. Consequently, a detailed study of the ICT used in
SIPS and its communication architectures are provided in this chapter. The proposed
method also effectively analysed the possible failure modes coexisting in each
component and SIPS. In addition, the continuous changing system conditions due to the
integration of renewable energies and demand side management make it difficult to
analytically assess the consequence of SIPS maloperations. The Monte Carlo
Simulation is more appropriate and accurate in evaluating SIPS risk under various
system conditions [80], and has not be applied to SIPS risk assessment. Consequently, a
method based on Sequential Monte Carlo simulation is provided in Chapter 6 to capture
the variation in SIPS risk with varying system condition.
Literatures in this area provide not only reliability assessment methods for SIPS
evaluation, but also components’ reliability data and cost data for various SIPS
operation. The effectiveness of the data greatly affects the usefulness and accuracy of
the risks assessment results. Therefore, to objectively evaluate the scheme risks, the
reliability and cost data need first to be agreed upon. These data can be obtained from
three main sources: 1) actual data can be acquired from a systematic measurement and
collection process; 2) data used in the previous publications such as databases or
handbooks; 3) data suggested by the experts and experienced engineers. Among them,
the actual data can best reflect the scheme performance. If assumed data are used, then
by performing sensitivity analysis, the impact of variation in the reliability data on
overall scheme risk calculations can be observed. It can also help to identify the most
critical components or the operational phase of a protection scheme.
5.2. SIPS Risk Assessment Procedures
Based on the previous method and a thorough study of existing Power Systems and
SIPS infrastructure, a SIPS risk assessment procedure is proposed in this section to
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 137
evaluate the risk introduced by SIPS operation and possible maloperations. Normally,
three basic steps are involved in the risk assessment of SIPS:
1) Reliability Assessment: the reliability assessment includes a thorough study on the
SIPS infrastructures and logics and identifies the possible failure modes. The
probability of each SIPS failure mode is then estimated using reliability assessment
method.
2) Impact Assessment: the consequences of different SIPS operating states under
various system conditions are estimated in terms of financial losses, which reflect the
severity of the impact on the overall Power System. The consequences of a SIPS
maloperation vary with the remedial control actions deployed by the SIPS, its failure
modes and the Power System conditions at the incident.
3) Risk Assessment: system risk is expressed as the product of each state probabilities
and its corresponding undesirable financial impact.
5.2.1. Reliability Assessment
The applications of some typical reliability assessment methods are first discussed in
this section. Figure 5-1 shows a typical procedure for SIPS reliability assessment.
Identification of Relative
SIPS Basic Components
FMEA on SIPS Component
Markov ModellingEstimate Probabilities of each
Component Operational State
Reliability Block DiagramCombine Individual Component’s
Operational State
Compare Results with
Reliability Requirements
Sensitivity Study
Wide Range Method & Risk
Reduction Worth
Figure 5-1: SIPS Reliability Assessment Procedures
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 138
The first step is to identify all the relative SIPS components. This requires a thorough
study on the physical layout, operating logic and purpose of the investigated scheme.
Next, the possible failure modes of each basic component and their impact on SIPS
performance are examined via Failure Mode and Effect Analysis (FMEA). The
reliability assessment procedure is carried out based on a combination of Markov Model
and reliability block diagram (RBD) and is used to quantitatively assess the
probabilities of each SIPS operational state. Once the probabilities of being in DBM
state and SBM state are identified (i.e. Pr(DBM) and Pr(SBM)), the reliability index
can be compared with the standards (e.g. SIL and STL) to determine whether the
scheme meets the reliability requirement. Next, sensitivity studies are applied to
investigate the impact of variation in the reliability data on the assessment results. The
importance of each SIPS component on the reliability performance can also be
identified.
The previously discussed reliability assessment methods which were applied to the SIPS
reliability assessment procedures are discussed here:
Failure Mode and Effect Analysis
Failure Mode and Effect Analysis (FMEA) is a systematic method designed to identify
the failure mode of a system in a “bottom-up” way. The entire protection system can be
hierarchically divided into several subsystems and modules and then analysed one
component at a time. SIPS components, which were identified through the first step in
reliability assessment, are considered as the basic components in the FMEA. A
component-level FMEA is first carried out to determine the possible failure modes of
each SIPS component based on its function and failure mechanism. The impact of
component failures on the performance of SIPS is next determined using a system level
FMEA.
Markov Modelling
Once the failure modes have been determined, the state probability of the components
and the frequency of entering each state at a given time need to be determined. Markov
Modelling is carried out to involve all the mutually exclusive states that a SIPS
component can exist in and to reflect the random behaviour of component state that
varies with time or space. The system transition from one state to another is driven by
either a system failure or a system repair. Once the failure and repair rates of each
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 139
component are known, the state probabilities of the components being in each
operational mode identified by the FMEA at a specific time in the future can be
calculated. In addition, the failure and repair actions can be effectively reflected in the
reliability study.
Reliability Block Diagram
Impact of each SIPS individual component’s operational state on fulfilling a certain
SIPS function is determined using the Reliability Block Diagram (RBD). The RBD (or
Network Modelling), which is a success-oriented network describing the function of the
system, is built to describe the logical connections of components needed to fulfil a
specific operation in SIPS application. SIPS components are represented as a number of
functional boxes interconnected. The resulting network is composed of components in
series, in parallel, or in combination configurations depending on the function needed.
A successful operational function can be viewed as a success path from left to right of
the RBD. Mathematical methods can be applied in combination with the RBD to
quantitatively evaluate the success and failure probabilities, e.g. Tie Set Method, Cut
Set Method, Conditional Probability Approach, Event Tree, etc.
5.2.2. Impact Assessment
The consequences following a SIPS maloperation may be significantly different and
vary with the system condition at the time of failure. Therefore, the impact of each
studied SIPS operational states is estimated in terms of financial losses under a wide
range of system conditions. Impact of SIPS maloperation includes financial losses
associated with equipment outage, generation curtailment, energy redispatch and load
shedding.
For each SIPS, the consequences of the three SIPS states need to be assessed: successful
SIPS operation, SIPS DBM and SIPS SBM. In addition, as described in the future UK
energy scenarios [81], significant changes in the UK energy composition is expected to
take place in the next decades. The deployment of large-scale wind energy and the
growth in the weather-dependent distributed generation calls for a precise model which
can reflect the variation and uncertainty in both generation and load demand and can
increase the degree of accuracy in the SIPS impact assessment. The method is further
illustrated in the numerical studies.
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 140
5.2.3. Risk Assessment
Knowing the probability and impact of each SIPS operational state, the risk from each
SIPS operation is defined as the probability of the state weighted by its corresponding
financial impact. Both analytical method and stochastic method can be applied for risk
calculation:
1) Analytical Risk Assessment: The analytical risk assessment is more suitable for
analysing an event-based SIPS with predefined protection strategies. It is
normally applied to a simple system with limited variation in load and
generation. The risk from each SIPS operational state is acquired as the
probability of the initiating events weighted by its corresponding financial
impact. The method significantly simplifies the computation procedure by
transferring the system conditions into a multi-level model. However, with more
complex SIPS operational logic and with the integration of more renewable
generation, this method has its limitation in precisely modelling the uncertainties
and variations in system condition.
2) Stochastic Risk Assessment: A SIPS risk assessment procedure based on
sequential Monte Carlo simulation (SMCS) can better reflect the time dependent
SIPS events and the time series feature of the dynamic load profile and the
generation output. The probability of being in, and the frequency of
encountering, each SIPS operational state can be mapped to represent different
scheme behaviours. A dynamic load profile and generation prediction models
can also be integrated into the SMCS procedure to evaluate SIPS risk under a set
of different system conditions.
5.3. SIPS Communication Infrastructure Modelling
5.3.1. Introduction of Studied SIPS Communication Architectures
As described in Chapter 2, operation of SIPS is increasingly reliant on a robust
communication network and the instrumentation, monitoring, communication, control
and protection systems made available by modern IEDs and communication protocols.
Therefore, the first step in SIPS risk assessment is to determine the communication
infrastructures and the operating logic of the SIPS. A substation based sensor IED
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 141
becomes a node of SIPS which collects information such as breaker status, current and
voltage signals and phasors.
As shown in Figure 5-2, the studied SIPS communication architectures are represented
as a number of functional boxes interconnected. Four digital substation based
communication architectures for a Generator Rejection Scheme (GRS) application are
proposed, considering redundancy at different levels in the SAS. For a GRS line-outage
detection system, the measurements from the primary network are collected by the bay-
level IEDs and then sent to the station host computer via the substation automation
system (SAS). The information can then be used for either local decision making or sent
to the control centre through the wide area networks (WAN) for centralised decision
making. A redundant WAN communication path in a hot stand-by mode is provided to
enhance the availability of the scheme.
CB1
CB2
CB3
CB4
PB
PB
IED1
IED2
IED1
IED2
IED1
IED2
IED1
IED2
LAN
LAN
LAN A
LAN B
LAN A
LAN B
PB1
PB2
PB1
PB2
WAN
SIPS Control Centre
Substation Automation System (SAS)
Arch1
Arch2
Arch3
Arch4
WAN
Sub #1
Sub #2
Sub #3
Sub #4
Line outage information
Figure 5-2: Protection and Communication Architecture of a GRS.
CB: circuit breaker, PB: process bus communication system, IEDs: Intelligent electronic
devices, LAN: local area networks, WAN: wide area network
Due to the critical line-outage detection function, the IEDs are implemented redundantly
in all the SIPS designs. The IEC 61850-9-2 substation process bus [82], receives the
voltage and current signals digitalized by the merging units (MU), and communicates
the data to the bay level IEDs. In Arch2 and Arch4, independent process bus
communication systems are provided for each bay IED. This is achieved by duplicating
the bay level Ethernet switches and the connected devices.
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 142
Measurements provided by the sensor IEDs are collected by the substation computer
over the substation local area network (LAN). The advent of IEC 62439-3 Parallel
Redundancy Protocol (PRP) allows the bay IEDs to operate via two separated and
independent LANs as indicated in the last two architectures, Arch3 and Arch4. The
IEDs could simultaneously send duplicated Ethernet packets through these two LANs
(i.e. LAN A & LAN B). Consequently, if one data frame fails to reach the host
computer due to traffic, the computer can still receive the required data from the other
network without any reconfiguration time, hence providing seamless redundancy.
For each line outage detection system, two independent breaker status signals can be
received from the redundant IEDs. The scheme compares the outputs from the
redundant system prior to issuing an operation [83]. Therefore, two different tripping
logics can be programed into each design:
1) Voting (1-out-of-2): if one of the two systems detects a line-outage, the logic solver
actuates the trip decision to initiate the scheme.
2) Vetoing: the logic solver validates the decisions made by the redundant systems
prior to issuing any trip decision. If the outputs of each system are different, the
system vetoes the trip decision.
5.3.2. Communication System Modelling
The studied SIPS communication architectures are represented as a number of
functional boxes interconnected using a RBD. In particular, RBD models for the
substation process bus, substation LANs and SDH/SONET WANs are developed
considering different reliability criteria (i.e. dependability or security).
1) Substation Process Bus Sensor Network Architectures:
Different process bus sensor network architectures were illustrated in Figure 3-9. The
RBD models to assess the dependability and security of the sensor network with
duplicated process bus and a 1-out-of-2 voting logic are shown in Figure 5-3. The RBD
for dependability is built by connecting the components used to fulfil the detection
function in series, whilst putting the redundant components in parallel. Consequently,
failures in one of the duplicated devices will not affect the successful operation of the
function. In terms of the RBD developed to evaluate system security, the scheme will
trip when any of the systems falsely generates activation signals when a 1-out-of-2 logic
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 143
is applied. Therefore, the RBD model is constructed by connecting all the elements
capable of causing security failures in series as shown in Figure 5-3 (b).
TS 1
TS 2
IT 1
IT 2
MU 1
MU 2
BIED 1
BIED 2
SW 1
SW 2
IED 1
IED 2
EM×5
EM×5
TS×2 IT×2 MU×2 BIED×2 SW×2 IED×2
(a)
(b)
Figure 5-3: RBD to Assess the Depededability (a) and Security (b) of the Substation
Sensor Network
2) Station Bus (LAN) Architectures:
A Frame B Frame
Switch 1Switch 2
Switch 3 Switch 4 Switch 5
Switch 10
Sender
Receiver 1 Receiver 2
Primary Path
SecondaryPath
Figure 5-4: Communication Path for Multicast GOOSE in PRP based Double-Ring
A detailed description of various LAN architectures was provided in Figure 3-6 and
Figure 3-8 in Chapter 3. The RBD models for two different communication
architectures are considered: the ring architecture and the IEC 62439-3 PRP based
double ring architectures. The IEC 61850-8-1 Generic Object Oriented Substation
Events (GOOSE) message, which is a multi-cast message, can be used to transmit
protection data over the LAN of a digital substation in milliseconds. Figure 5-4
shows the communication path of sending a multicast Ethernet frame from the local
bay to two recipients allocated in other bays via PRP LANs. RBD models for a 1-to-
2 communication service (e.g. multicast GOOSE message) in a PRP ring LAN are
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 144
developed as illustrated in Figure 5-5. Noting that it is assumed there are 10 bays in
each substation.
EM×2
EM×2
Station SW1
Station SW2
EM
EM
Bay#1 SW×2 Station SW×2
Bay SW 1
Bay SW2
EM×2
EM×2
Station SW1
Station SW2
EM
EM
Bay SW 1
Bay SW2
(a)
(b)
Bay#2 SW×2
Figure 5-5: RBD to Assess the Depededability (a) and security (b) of the PRP Ring
LAN
3) SDH WANs:
The control centre (NCC) may require information from multiple substations at
different locations to initiate a specific remedial operation in a wide area network. The
reliability of the SONET WAN is affected by the number of substations required for the
SIPS application. Communication path in the WAN is similar with the double ring LAN.
Figure 5-6 shows the RBD for a 1-to-2 communication path in a SONET ring WAN.
Noting that it is assumed that there are 15 nodes in the SONET ring.
Sub#1 RU Sub#2 RU
Primary Ring: FI×11+RU×9
Backup Ring: FI×6+RU×3
NCC RU
NCC RUSub#1 RU Sub#2 RU
(a)
(b)
Figure 5-6: RBD for SONET WAN (a) Dependability and (b) Security
5.4. SIPS Reliability Assessment
5.4.1. Failure Mode and Effect Analysis
The first step is to determine all the possible failure modes for each component in the
communication architecture model and then determine its overall impact on SIPS
performance. In general, the failures of an individual SIPS component can lead to either
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 145
a dependability-based maloperation (DBM) or a security-based maloperation (SBM).
Four possible basic operational modes which can reside in a component are considered.
Based on the failure mechanism, some failure modes can be detected by the self-
monitoring function embedded in the device. A detailed description of each component
operation mode and its possible impact on the overall performance of the SIPS are
analysed as follows:
a) Normal State (State 0): In this state, components operate as designed and therefore
meet both dependability and security criteria. When all the components are in this
state, SIPS will operate as designed.
b) DBM, detectable (State 1): A component fails to deliver the designed function when
it is required. This type of DBM failure can be detected by either self-testing or
routine test. Therefore, faulted devices can be fixed and replaced before leading to a
SIPS DBM.
c) DBM, undetected (State 2): The component fails to deliver the designed function
when it is required. However, this type of component failure cannot be detected by
either self-testing or routine test. Consequently, operators are not notified of the
failure of the component. This could lead to a SIPS failure to operate when needed
if no redundancy is provided.
d) SBM (State 3): The component operates when it is not required. This can be caused
by either a spurious operation or hidden failures of the protection devices.
Eventually, it could lead to an unwanted operation of SIPS.
It is worth noting that not all the SIPS components can be in each of the four failure
modes. The possible failure modes need to be analysed based on the function and the
failure mechanism of the studied component. Not all the components are able to
contribute to the security based maloperation (SBM). For example, the Ethernet Media
(EM) such as the fibre optical cable is unable to generate any spurious trip signal by
itself. In addition, some failures can be detected in a timely manner while some remain
hidden.
SIPS have a complex architecture and comprise a number of functional modules and
each module consists of more than one basic element. Therefore, the reliability block
diagram (RBD) is used to combine individual components’ operating states and
determine the overall operational behaviours of SIPS. The combination of different
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 146
SIPS component states and system contingencies leads to several different overall SIPS
operational states, which can be categorized as follows:
a) SIPS Normal operation: SIPS operates correctly and promptly as designed.
The impact of a successful operation depends on the mitigation action of the
scheme. For example, when a Generator Rejection Scheme (GRS) operates as
designed, it trips a predefined generator. This will cause financial costs
associated with generator start-up and the re-dispatch of its output to other
generators during its outage.
b) SIPS DBM: SIPS fails to take action when it is required during system
contingencies. The consequence following a SIPS DBM is normally severe and
may have cascading impact on system operation.
c) SIPS SBM: SIPS operates when it is not required. Spurious operation signals
from SIPS component may lead to unwanted SIPS operation. The impact of
SBM is similar to a normal SIPS operation.
5.4.2. Markov Modelling
After determining all the possible failure modes of each SIPS component using FMEA,
the probability being in each state is estimated by Markov Modelling. A 4-state Markov
Model was developed as shown in Figure 5-7 and used to capture all the possible failure
modes coexisting in a SIPS component. It was then used to estimate the probability of
being in, and frequency of encountering, each state.
State 0
Normal Operation
State 1
DBM, detected State 2
DBM, undetected
State 3
Spurious Trip
λdd
λud
λst
µdd µst
λud
µud
Pr(DBM)Pr(DBM)
Pr(SBM)
Figure 5-7: Markov Model for SIPS Component Reliability Assessment
Considering the components’ mean time to failure (MTTF) encompasses all the failure
modes of a component, the following equations are used to estimate the failure rates
associated with each operational mode:
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 147
1dd ud st
MTTF (5-1)
: : : :dd ud st (5-2)
The component reliability data (or mean time to failure (MTTF) data) used in this
section are based on previous reliability assessment data, as shown in Table 3-3.
Knowing the MTTF, the failure rate of each mode is determined by the parameters ,
and . Due to the self-monitoring capabilities of numerical devices, the majority of
component failure states can be detected in a timely manner. Therefore, it is assumed
failure rate of detectable DBM is two times that of the undetectable DBM, leading to
equals to 0.5 . The probability of a SBM is assumed to be the same as that of a
detectable DBM failure, i.e. equals to . The repair rate of the detectable failures µdd
and µst are equal to 398.2 (year-1), as the faulty devices are required to be replaced
within 22 hours as required by WECC [18]. The maintenance testing which can detect
hidden failures is assumed to be carried out once every two years. This leads to a µud
equal to 0.5 (year-1). Sensitivity analysis will be carried out to assess the impact of
uncertainty in the reliability data on the simulation results.
Knowing the failure and repair rate, the transmission probability matrix B can be
obtained according to Equation (5-3). The probability of being in each state after m
intervals (P(m)) and the frequency of encountering each individual state f(S) can be
calculated as:
1
1 0 0
0 1 0
0 0 1
dd ud st dd ud st
dd dd
ud ud
st st
B
(5-3)
( ) (0)( 0) ( 1) ( 2) ( 3)m mP Pr S Pr S Pr S Pr S P B (5-4)
( ) ( ) ( ) ( ) ( )d ef S P S S P S S (5-5)
where Bm is the transition matrix of the Markov model, ( )P S and ( )P S are the
probabilities of being and not being in the state, ( )d S represents the rate of departure
from the state S and ( )e S represents the rate of entry into the state S.
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 148
5.4.3. Reliability Block Diagram
Due to its limited function in addressing the failures caused by the combination of the
subsystem, the Markov Model is only used to analyse the performance of the individual
component. With the probability of components being in each state determined using
Markov Model, the impact on the overall performance of SIPS is determined using the
Reliability Block Diagram (RBD). With the RBD model for each communication
architectures built, the Minimal Tie Set Method [84] is used to estimate the reliability of
various GRS operations. A minimal tie set is a path set containing the minimum number
of units needed to guarantee a connection between the input and output in the RBD. For
a system to fail, all the tie sets must fail. For a given architecture, it is assumed that T1,
T2, …, Tp are the minimal cut sets. Xi is component state (i=1, …, n), n is the number of
system components. The reliability of the structure can be written as:
1
( )j
p
i
i Tj
X X
(5-6)
For the sensor network with duplicated process bus as shown in Figure 5-3 (a), there are
four minimal tie sets:
1
2
3
4
{ 1, 1, 1, 1, 1, 1, 5}
{ 2, 1, 1, 1, 1, 1, 5}
{ 1, 2, 2, 2, 2, 2, 5}
{ 2, 2, 2, 2, 2, 2, 5}
T TS IT MU BIED SW IED EM
T TS IT MU BIED SW IED EM
T TS IT MU BIED SW IED EM
T TS IT MU BIED SW IED EM
(5-7)
The dependability of the studied architecture can be calculated as:
1 2 3 4 1 2 3 4 1 2 1 3
1 4 2 3 2 4 3 4 1 2 3
1 2 4 1 3 3 2 3 4 1 2 3 4
5 2
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
4 2
sys
TS IT MU BIED SW IED EM TS IT MU
R P T T T T P T P T P T P T P T T P T T
P T T P T T P T T P T T P T T T
P T T T P T T T P T T T P T T T T
P P P P P P P P P P P
5
5 2 5 22 ( ) ( )
BIED SW IED EM
TS IT MU BIED SW IED EM TS IT MU BIED SW IED EM
P P P
P P P P P P P P P P P P P P
(5-8)
where P is the dependability of the component.
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 149
5.4.4. Reliability Assessment Results
The calculated probabilities of dependability-based maloperation and security-based
maloperation of the described sensor network architectures, LAN and WAN are shown
in the following table:
Table 5-1: Substation based Sensor Network Reliability Assessment Results
Comm. Arch.
SIPS Operational Phases
Activation Arming
Pr(DBM) Pr(SBM) Pr(DBM) Pr(SBM)
Single
Process Bus
Voting 1.65x10-2 2.33x10-5 2.25x10-2 1.83x10-5
Vetoing 2.86x10-2 1.33x10-5 2.25x10-2 1.83x10-5
Duplicated
Process Bus
Voting 6.47x10-3 3.16.x10-5 1.25x10-2 2.66x10-5
Vetoing 3.86x10-2 4.99 x10-6 1.25x10-2 2.66x10-5
The reliability assessment results for the sensor network indicate that the
implementation of duplicated process bus at substation bay level can significantly
increase the dependability of the activation and arming phases of the SIPS. However, it
may also compromise the performance in terms of security with increased Pr(SBM).
Upon receiving the tripping signals from duplicated bay IEDs, a voting tripping logic
design delivers better system dependability while a vetoing logic can effectively prevent
spurious trips. It can be seen that when a vetoing tripping logic is implemented,
duplication in the process bus system could reduce the dependability in detecting line
outage. Meanwhile, regular testing and timely replace of faulty devices is vital in
keeping the device in its normal operating state. The reliability of the single ring and
PRP ring LAN architectures and the SDH ring WAN architecture are estimated using
the RBD. It can be seen that the single ring LAN architecture may not be sufficient for
the SIPS application due to its low dependability. The reliability data are then used in
the numerical studies to illustrate the impact of the communication architectures on
SIPS performance.
Table 5-2: LAN and WAN Reliability Assessment Results
Comm. System Pr(DBM) Pr(SBM)
LAN Single Ring LAN 9.68x10-3 7.79x10-6
PRP Ring LAN 9.38x10-5 1.56x10-5
WAN SDH Ring 1.52x10-4 1.98x10-5
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 150
5.5. Risk Assessment Numerical Illustration: Analytical Method
In this section, the analytical SIPS risk assessment method is applied to assess an event-
based Generator Rejection Scheme (GRS) implemented in a 3-bus system as shown in
Figure 5-8. The GRS comprises two basic operational phases: the activation phase,
which continuously monitors the status of the two critical lines (i.e. Line 1 and Line 2),
and the arming phase, which monitors the load level at Load 3 and arms the GRS only
when it is higher than a certain level. DC power flow was used in this analysis. Table 5-
3 shows the system generation and load level of the test system. The thermal limit for
all the circuits is set to be 200 MW.
Figure 5-8: 3-Bus System with Generator Rejction Scheme (GRS)
Table 5-3: Generation Data of the 3-bus System
Bus Generator Capacity(MW) Peak Load
(MW) Min Max
B1 G1 175 500 170
B2 G2,G3 150 400 100
B3 - - - 500
5.5.1. GRS Operating Logic
The purpose of the GRS is to maximize the power transfer from the low-cost generators
at bus B2 to bus B3. Based on the DC power flow results, when the load level at Load 3
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 151
exceeds 400 MW, outage of either Line 1 or Line 2 will result in the overload and
cascade tripping of the other line. Hence, the GRS needs to be armed to trip G2 if a
critical line outage is detected and the output of G2 must be redispatched to G1.
Without GRS, or in the case of GRS DBM, and during high load demand conditions, an
outage of a critical line will lead to cascade tripping of the transmission lines
interconnecting to Load 3, which may eventually result in isolating the load from the
rest of the system. In addition, there is also a risk that the generators connecting to B2
would also be tripped due to the out-of-step condition, if this follows a significant
decrease in the load demand. The implementation of GRS could effectively mitigate the
overloading on the stressed circuit by redispatching the output of the tripped generator
at B2 to the generator at B1. When there is a fault on one of the critical lines (line 1 or
line 2), the fault should be cleared by opening the circuit breaker at both ends of the
circuit. The scheme will then be activated after receiving the trip signal from the
protection device. At the same time, if the scheme is armed, generator G2 will be
tripped to prevent the overload of the healthy transmission line between B2 and B3.
5.5.2. Analytical Risk Assessment Procedures
5.5.2.1. Identification of Initiating Events
The event-based GRS can be activated by two basic events, F1 and F2, which represent
the outage of Line1 and Line2 respectively. Therefore, based on the IEEE RTS
reliability data [85], the line outage rate (λline) is equal to 4.57×10-5 (hr-1) (MTTF = 2.5
years). The probability of the basic event in the next hour is obtained by approximating
the line outage event as an exponential distribution:
5Pr( ) 1 5.71 10t
iF e (5-9)
Pr( ) 1 Pr( ) 0.99994i iF F (5-10)
where Pr( )iF and Pr( )iF represent the probability of occurrence and non-occurrence of
the initiating event Fi respectively.
Additionally, the GRS is only armed into service when the load level at Load 3 is higher
than a pre-specified value (Load 3>400MW), the probability of which can be estimated
by analysing the year-round load profile provided by the IEEE-RTS load model. With
the peak load at B3 being 500MW, the probability of load 3 being in a high load level
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 152
which requires the GRS to be armed is expected to be 12.45%. Therefore, the
probability of the scheme being armed Pr(L) is:
12.45%Pr(L) (5-11)
Next, five initiating events are combined considering all the possible combinations of
the basic events:
E1: No line outage. GRS not armed.
1 1 2( ) ( ) ( ) ( )Pr E Pr F Pr F Pr L (5-12)
E2: No line outage. GRS armed.
2 1 2Pr(E )= Pr( F )× Pr( F )× Pr(L) (5-13)
E3: One line outage. GRS not armed.
3 1 2 1 2Pr(E )= Pr(F )× Pr( F )+ Pr( F )× Pr(F ) × Pr( L ) (5-14)
E4: One line outage. GRS armed.
4 1 2 1 2Pr(E )= Pr(F )× Pr( F )+ Pr( F )× Pr(F ) × Pr(L) (5-15)
E5: Both lines outage.
5 1 2Pr(E )= Pr( F )× Pr( F ) (5-16)
Since the adjacent circuits are considered to be independent, the probability of
simultaneously losing both lines is negligible. Consequently, event E5 is not considered
in the case study. Among these initiating events, only E4 requires a GRS operation.
5.5.2.2. Formulate GRS Risk Expression
The probability of the system initiating events, combining with different GRS operation
states, can be denoted as Pr( ( , ))iE T T , where ( , )T T denotes whether the GRS operates
or not. The following situations are then considered (“Act1” and “Act2” represents the
line-outage detection systems of Line1 and Line2 respectively. “Arm” represents the
load monitoring system used to arm the GRS). Im(Normal), Im(DBM) and Im(SBM)
represent respectively the impact of SIPS normal operation, DBM and SBM on system
operation under a particular system condition:
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 153
1) Situation 1 ( 1E T ): Unwanted GRS operation during initiating event E1. GRS
operation when it is not armed and there is no fault on line 1 or line 2. This requires
security-based misoperation in both activation and arming phases.
1 1Pr( ) Pr( ) [Pr( _ 1) Pr( _ 2)] Pr( _ )E T E SBM Act SBM Act SBM Arm (5-17)
1Im( ) Im( )E T SBM (5-18)
2) Situation 2 ( 2E T ): Unwanted GRS operation when it is armed but without circuit
outage in the system. This situation is caused by SBM in either of the two line-
outage detection systems.
2 2Pr( ) Pr( ) [Pr( _ 1) Pr( _ 2)]E T E SBM Act SBM Act (5-19)
2Im( ) Im( )E T SBM (5-20)
3) Situation 3 ( 3E T ): Unwanted GRS operation when a fault occurs on one circuit,
but the load level is lower than the triggering point. GRS trips due to SBM in the
arming phase.
3 3Pr( ) Pr( ) Pr( _ )E T E SBM Arm (5-21)
3Im( ) Im( )E T SBM (5-22)
4) Situation 4 ( 4E T ): fault occurs on one critical circuit. Meanwhile the load level
at Load3 exceeds 400MW. GRS fails to operate due to dependability-based
misoperations (DBM) in either arming or activation phase.
4 4Pr( ) Pr( ) [Pr( _ ) Pr( _ )]E T E DBM Act DBM Arm (5-23)
4Im( ) Im( )E T DBM (5-24)
5) Situation 5 ( 4E T ): fault occurs on one critical circuit, meanwhile Load3 exceeds
400MW. GRS operates as designed.
4 4Pr( ) Pr( ) [1 Pr( _ ) Pr( _ )]E T E DBM Act DBM Arm (5-25)
4Im( ) Im( )E T Normal (5-26)
The impact of GRS normal operation, DBM and SBM is estimated as described in
Table 5-4 by considering the corresponding consequence and financial impact [74, 86].
In the case of successful GRS operation, generator2 (G2) connected to Bus B2 will be
tripped by the scheme for 2 hours. As illustrated in equation (5-27), the financial cost is
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 154
associated with the start-up of G2 and the redispatch of G2’s output to other generators
during the next two hours. In the case of DBM, during high loading conditions, outage
of any critical line (i.e. line 1 or 2) will result in overloading and cascade tripping of the
transmission lines interconnecting Load3, isolating it from the generation plants. Outage
of the critical lines will also cause an out of step condition for all the generators
connected to Bus B2. It is assumed that all the generators at the plant (G2 and G3) will
accelerate and then be tripped due to over-speed. Load2 will then be supplied by G1.
The impact caused by SBM will be exactly the same as the impact of normal operation.
Table 5-4: Impact Assessment for GRS Misoperation [74, 86]
Cost Items Quantity(MW) Duration(hrs) $/MWh(Case)
Success Operation
Unit start-up G2 - 5000 $/Case
Re-dispatch PG2=194.5 2 50 $/MWh
Dependability-based Misoperation
Load shedding PLOAD_3=424 2 18,000 $/MWh
Unit start-up G2, G3 - 5000 $/Case
Re-dispatch PLOAD_2=100 2 50 $/MWh
Security-based Misoperation
Unit Start-up G2 - 5000 $/case
Re-dispatch PG2=194.5 2 50 $/MWh
Based on the assessment, the impact of each GRS state can be calculated as:
2Im( ) 50 2 5000 $24,455GNormal P (5-27)
3 2Im( ) 18000 2 50 2 10000 $15,284,000LOAD LOADDBM P P (5-28)
2Im( ) 50 2 5000 $24,455GSBM P (5-29)
The risks induced by GRS can be calculated as:
5( ) Pr( ) Im( )Risk Normal E T Normal (5-30)
4( ) Pr( ) Im( )Risk DBM E T DBM (5-31)
1 2 3( ) Pr( ) Pr( ) Pr( ) Im( )Risk SBM E T E T E T SBM (5-32)
The probabilities used in the GRS risk expression can also be estimated using the fault
tree analysis (FTA). For example, the probability of insecure GRS DBM when the fault
occurs (Situation 4) 4Pr( )E T can be estimated using FTA shown in Figure 5-9. The
highest event happens when the scheme is in the DBM state, whilst GRS operation is
required. GRS DBM is caused by an absence of line-outage identification (activation
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 155
phase) or the arming signal (arming phase). While GRS is required when there is a fault
on Line1 or Line2 (1 2Pr( ) Pr( )F F ) and the load level at Load3 exceeds 400MW.
Figure 5-9: Fault Tree Analysis (FTA) to Assess the Probability of GRS DBM
5.5.3. Analytical Risk Assessment Results
The analytical assessment method is carried out to evaluate the performance of the local
SIPS. Two different sensor network communication architectures, as illustrated in
Figure 3-9 in Chapter 3, are considered, while the reliability of the wide area
communication networks are not addressed. The scheme risk induced by
communication architectures with different tripping logics is shown in Figure 5-10.
Figure 5-10: GRS Risk Assessment Results for Different Sensor Network Architectures
It can be seen that the risks caused by GRS normal operation “Operating Cost” stay
approximately the same for all of the four GRS designs. DBM risks are the main
operating risk for all the studied designs. This is because in the three-bus system, DBM
of GRS will lead to a complete loss of load at Load 3, which has a considerably larger
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 156
impact than a SBM. Moreover, the use of the vetoing logic delivers a better
performance in SBM. However, it also leads to a noticeable increase in the risk of DBM,
especially for the scheme with a high level of redundancy (e.g. Arch2 vetoing).
Therefore, implementation of process-level communication system redundancy requires
careful consideration and may not simply lead to a more reliable situation.
The previous assessment considers the GRS with the line-outage information acquired
from the local substation. With telecommunication facilities between substations, line-
outage detection devices from the remote terminals of a transmission line can be
acquired by the scheme as an intertripping (I/T) signal, which could significantly
enhance the dependability performance of the scheme. Figure 5-11 compares the risks
of the previous assessed GRS designs (Arch2 voting & vetoing) with the risks of GRS
using the intertripping signal from the remote end of the transmission line. It can be
seen that monitoring the line status at both ends of the line delivers less risk in DBM.
However, as a tradeoff, it leads to an increased risk of SBM for both designs.
Meanwhile, increased time delay may be caused in the GRS decision making, when
considering the transmission of signals from the remote terminal to the local scheme
programmable logic controllers (PLC).
Figure 5-11: GRS Risk Comparison with and without Intertripping (I/T) Signal
5.6. Sensitivity Study
Due to the high uncertainty of the data used in the reliability assessment, sensitivity
analysis is carried out to determine the impact of the variation in the assumed data on
the risk evaluation results. Moreover, sensitivity analysis can also determine the
weakest operational phase in GRS, which needs to be improved to fulfil the reliability
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 157
requirements. The factors affecting the GRS risks and the selection of the optimal
design are listed as follows:
a) The reliability of electronic components: mean time to failure (MTTF) and mean
time to repair (MTTR) data.
b) The frequency of scheme initiating event (e.g. critical line outage rate λline for
GRS).
c) The probability of the system being in its overloading state (Pr(L)).
The first factor focused on the reliability of SIPS infrastructure and its maintenance
strategy, while the last two factors consider the impact of Power System conditions on
the assessment results. In particular, the wide range method is used to change the value
of these factors over a wide range to examine their impact on the risk assessment results.
The sensitivity analysis results are then used to determine the optimal scheme design
under different system conditions.
5.6.1. Impact of Component Reliability on GRS Risk
The reliability of the electronic components used in the SIPS affects the successful
arming process and the detection of the scheme initiating events. The Mean Time to
Failure (MTTF) is affected by many factors such as vender, age, weather, etc. The
MTTF and MTTR of the SIPS component are varied from 0.1 to 10 times of its original
value in Figure 5-12. Changes in MTTF and MTTR affect the failure rate (λ) and repair
rate (µ) of a component as illustrated in Equation (5-1). Hence, the probabilities of a
component being in each operational state in the Markov Model also vary. It can be
observed that with an increased MTTF, the overall scheme risk reduces. Whilst the
increase in MTTR (decrease of µ), which means a less frequent inspection and
maintenance, leads to higher scheme risks.
Variations in components reliability data won’t affect the selection of the sensor
network since risk introduced by the GRS using vetoing logic and the intertripping
signal stays the lowest. However, to enhance the performance of the GRS, maintaining
the reliability of the devices high and carrying out more frequent inspection are proved
to be effective methods. Figure 5-13 illustrates the improvement in GRS (Arch2, voting)
performance by enhancing the reliability or maintenance of different GRS operational
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 158
phases. The entry point for the GRS risks to be less than a certain level (1$/hr) for
different strategies are shown in Table 5-5.
Figure 5-12: Impact of MTTF and MTTR on Risks of Different GRS Designs
Increasing the MTTF of either the activation phase or the arming phase or both of them
could deliver enhanced overall GRS performance. For example, the total scheme risk
drops from 1.2 $/hr to 1 $/hr at approximately 1.4×MTTFbase for enhancing both
operational phases, 1.6×MTTFbase for the activation phase and 3.9×MTTFbase for the
arming phase. The increase in system MTTR means a lower frequency in scheme
testing and maintenance. Therefore, it leads to higher scheme risks. Maintenance on the
activation phase has a more significant impact on enhancing system reliability as
compared with the arming phase.
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 159
Table 5-5: Entry Point for GRS Risk to Reach below 1$/hr
SIPS Phase Times (×MTTF_base) Times (×MTTF_base)
Activation 1.6 0.65
Arming 3.9 0.25
Both Phases 1.4 0.77
Figure 5-13: Impact of Reliability of each GRS Phase on Overall Risks for Local GRS
Left: Enhancing MTTF on GRS (Arch2, Voting) Risk
Right: Enhancing MTTR on GRS (Arch2, Voting) Risk
5.6.2. Impact of System Conditions on GRS Risk
The risk of the studied event-based GRS also varies with system conditions such as
line-outage rate (λline) and system load levels. Variation in line-outage rate λline affects
the frequency of the GRS being triggered. The reliability of the transmission line can be
affected by many factors, such as weather condition, power flow, terrain, etc. Figure 5-
14 demonstrates the impact of line outage rate on different GRS designs. A higher line
outage rate leads to an increased overall risk for all the GRS designs. Variation in line-
outage rate has very little impact on risks caused by security-based misoperation (i.e.
Risk(SBM)). However, the risk of DBM and risk of normal operation are increasing
linearly with the line-outage rate. Consequently, when the critical lines are highly
reliable, risk of SBM becomes the main contribution to the GRS overall risks. On the
contrary, for unreliable lines, Risk(DBM) and Risk(Normal) will significantly affect the
GRS overall risk. More specifically, when the line outage rate is less than 0.7 times the
base value (i.e. λline-new=0.7×λline), schemes using only the local line-outage information
have a better performance than the schemes using both local and remote line-outage
signals. However, when the scheme is required to operate frequently, it is more effective
to have a SIPS design with better dependability.
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 160
Figure 5-14: Impact of Critical Line Outage Rate on GRS Risks
Variations in the Load 3 annual peak load might change the GRS risk in two different
ways: Firstly, the frequency of GRS operation is changed due to the change in the
probability of the scheme being in the arming state. Secondly, the financial impact of
losing Load3 is also changed due to the change in its quantity. Figure 5-15 shows the
variation in GRS risks with the annual peak load at Load3. Load1 and Load2 are kept
fixed in this wide range analysis. When Load3 annual peak value is relatively low,
Risk(SBM) becomes the main composition to the scheme risks. Consequently, scheme
designs with better security deliver better overall performance. However, for higher
Load3 level, GRS is required to operate more frequently, resulting in higher DBM risks.
Under the circumstances, introducing redundancy in the communication system or in
the activation signal could minimize the risk from DBM.
Figure 5-15: Impact of Load Level on GRS Risks
Chapter 5: Risk Assessment of a System Integrity Protection Scheme
Page | 161
5.7. Summary
A SIPS risk assessment procedure based on FMEA, Markov Model and Reliability
Block Diagram is described in this chapter and applied to a portion of the IEEE
Reliability Test System with GRS logic. The probability of different SIPS operational
states and their impact on system integrity are evaluated. The procedures are used to
quantify the risks of GRS with different sensor network architectures and tripping logics.
The optimal design can be determined by comparing the annual cost of GRS.
Meanwhile, a relatively low variation in scheme operational risk is also vital in a system
with continuously changing system conditions.
By comparing the performance of GRS implemented based on different sensor network
architectures, it can be concluded that the implementation of duplicated process bus
communication systems may not necessarily lead to a better performance, since it also
cause a noticeable decrease in scheme security. Risk brought by scheme SBM can be
effectively controlled by the use of vetoing tripping applied to the redundant line-outage
detection IEDs. In addition, enhanced performance can be provided by a centralized
GRS as compared with a local GRS. The proposed methodology can help utilities
understand the impact of ICT on the SIPS performance and how the scheme architecture
can be designed to balance the trade-off in SIPS dependability and security.
Applying the sensitivity study to the factors governing the scheme performance
provides useful guidance for the utilities in their allocation of inspection and
maintenance. The wide range method can be used to assess the effectiveness of different
reliability enhancement strategies to minimize the risk following SIPS undesirable
operations. It helps identify the most critical factor in the scheme dependable and secure
operation. Moreover, by assessing the variation in the operational risk under different
system conditions, the scheme operator could select the optimal GRS logic design based
on the current system condition.
Page | 162
CHAPTER 6
RISK OF IMPLEMENTING SIPS IN A
SYSTEM WITH LARGE-SCALE WIND
INTEGRATION
6.1. Future UK Power System
The requirement to decrease the carbon intensity of the electricity system requires an
increasing penetration of renewable energy and the removal of generation based on
fossil fuels. The renewable generation being connected to the system, notably the wind
energy, are bringing significant challenges to the transfer capability of the transmission
network. In addition, the integration of the wind farms also generally means the sources
are remote from the load and the existing transmission lines are normally expected to
operate closer to their operating limits particularly when the wind intensity is high. SIPS
applications, motivated by wind energy, become increasingly attractive in long-term
system planning to mitigate the impact of cascading events triggered by extreme
contingencies, because of the relatively low cost of SIPS as compared with transmission
expansion. To effectively assess the risk of SIPS operation, there is a need to address
the changes expected in the future Power System in the risk assessment model [87]. The
Great Britain Power System involves significant deployment of wind generation in
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 163
Scotland and off-shore, whilst the demand is mainly in Southern England. A generator
rejection scheme (GRS) is a commonly used SIPS implemented to enhance the
utilization of a transmission corridor. Hence, the UK National Grid’s documents, such
as electricity ten-year statement (ETYS) and system operability framework (SOF), were
reviewed to update the changes expected in the system operating conditions into the test
system [81, 88].
6.1.1. Future Energy Scenarios and Wind Generation
In the future, the Power System in Great Britain is going to involve a significant
deployment of wind generation in the North and off-shore. This section provides a more
detailed description of the generation backgrounds and outlines the key changes in the
composition of generation expected in the next 20 years. The future energy scenarios
were firstly defined by UK government former Department of Energy & Climate
Change (DECC) and are updated annually in consultation with stakeholders. It outlines
a few alternative directions for system development. These scenarios provide a useful
source about information of future system performance and have been termed:
Consumer Power, Gone Green, Slow Progression and No Progression. They are
significantly different from each other but all meet the requirement for security of
supply with sufficient generation capacity.
A significant increase in the percentage of capacity associated with renewables can be
expected in the next 20 years for all four scenarios. For the highest scenarios (i.e. Gone
Green and Consumer Power), the penetration of renewables could reach 41% in 20
years. Even for the scenario with the slowest progress (i.e. No progression), 26% of
UK’s electricity will come from renewable sources by 2036 to meet the target. At the
same time, the proportion of conventional generations from coal and gas continues to
decrease for the coming years. The trend of generation mix for the Gone Green scenario
is illustrated in Figure 6-1 and is normally selected for the future system performance
study due to its particularly high penetration of renewable generation and resulting low
system inertia. In particular, the proportion of wind energy could occupy 17.5% of the
total electrical generation, which is approximately 10.5 GW. Therefore, the high
penetration of wind energy stimulates the application of generator rejection scheme
which trips the non-priority generation (e.g. wind generation) during overloading.
Although the variability of renewable sources can be mitigated by integrating
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 164
generation over large areas [89, 90], there is still a potential to encounter extreme
system conditions. For example, according to [89], 50% of UK wind turbines might
experience zero-output events coincidentally, with an average of 100 hours per year.
During a winter with low-wind speed and low-temperatures, the expected high amount
of load demand could require significant reserve generation capacities. Consequently,
the intermittent nature of the renewable generations has to be reflected in the risk
assessment model.
Figure 6-1: Gone Green Transmission Generation Mix [81]
6.1.2. Load Profiles
Due to the growth in weather-dependent distributed generation (DG), the transmission
demand is becoming more variable. Large generators and other interconnectors have to
be more flexible in order to accommodate the variation in transmission demand. In
addition, the use of smart meters raises the prospect domestic customer demand can be
responsive to changes in supply capacities and financial cost of bought energy. It also
facilitates the prediction of the transmission demand which provides more precise
information on how much capability is needed and how often it is required. The
proposed risk assessment could then focus on extreme demand conditions.
In the year of 2016, the UK transmission demand varied from 17 GW (i.e. summer
minimum) to 52 GW (i.e. winter peak). Typical daily demand profiles for the four
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 165
different future energy scenarios are illustrated in Figure 6-2. The variation in the
transmission demand between different scenarios becomes more noticeable over the
next decades. The growth in distributed solar generation causes transmission demand to
be suppressed during the middle of the day when the sunshine reaches its peak value.
This phenomenon is more significant during summer days with strong sunshine and
among the future energy scenarios with greatest distributed solar generation growth (i.e.
Gone Green, Consumer Power). In the future, the continuous growth in renewable
distributed generation is expected to result in a greater change in the magnitude of
transmission demand over a smaller timespan. This may significantly affect the risk
associated with the protection schemes. Consequently, a more flexible protection
strategy is required to adapt to the changing system conditions.
Figure 6-2: Variation in Daily Load Profile for Different Energy Scenario [81]
6.1.3. Transmission Line Reinforcements
With the continuously changing GB energy landscape, as described in the previous
sections, the National Electricity Transmission System (NETS) will face significant
challenges in fulfilling the transfer capability. The potential deficits in network capacity
can be caused by the following reasons [81]:
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 166
1) Large amount of wind generation connected to the Scottish networks significantly
increases the transmission capacity requirement from Scotland into England.
2) An increasing amount of low carbon generation and interconnectors in the
northern England region increases the export requirement into the English
Midlands.
3) The West Midlands region will have to import more power from distant regions
due to the reduction in conventional generation capacity in the Midlands.
4) Growth in the power transfer from offshore wind generation on east coast to the
southern regions will stress the southern NETS. Meanwhile, the interconnectors
are placing increased stress on the Southern English network when exporting
power out of GB.
To develop the British network in an efficient, coordinated and economic way, the
future requirements and the present capability and deficits of the NETS is assessed in
the Network Option Assessment (NOA) report [91]. For each defined boundary,
transmission expansion options for each energy scenarios are determined based on both
current transmission capability and future generation integration.
According to the NOA, the “commercial and non-build” option is another way to
reinforce the transmission capacities. This includes initiative strategies such as: 1)
Network support from demand side response and distributed resources (e.g. embedded
generation, load and storage), 2) Using active network management on generation and
demand such as inter-trips, 3) providing reactive power services at certain locations.
However, the corresponding reliability requirement and risk assessment procedures are
required to evaluate the possible additional risks caused by these operational strategies.
6.2. Stochastic Risk Assessment Procedures
The continuously changing GB energy landscape will bring significant challenges to
system integrity in the future years. The rapid increase in the proportion of renewables
will require utilities to more frequently trip non-priority generation, such as gas, coal
and wind generation, to ease transmission congestions. Therefore, future Power Systems
need to be equipped with more SIPS, and these would mainly be used for generation
rejection, to ensure the reliability requirements are satisfied and increase the capability
to integrate more renewables.
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 167
As reviewed in Chapter 5, most of the SIPS reliability assessment methods are
analytical. This becomes insufficient to reflect the impact of fast changing system
conditions on the SIPS operational risks. In addition, the use of wide-area SIPS or
centralised SIPS, made available by fast developing ICT, allows the SIPS decision to be
made based on measurements collected over a wide area network. This significantly
increases the complexity in SIPS operation and also requires the reliability assessment
method to address the impact of SIPS on a wide area system.
In this chapter, a method based on Sequential Monte Carlo Simulation (SMCS) is
proposed and firstly applied to assess SIPS operational risk. This method allows
dynamic SIPS risk assessment taking account the time varying feature of the generation
and load demand. The SMCS method is best suited for the time dependent SIPS events
and is used to simulate the time series feature of the load profile and the wind output. A
dynamic annual hourly load profile based on the IEEE-RTS load model and the wind
farm output profiles predicted using the auto-regressive and moving averages (ARMA)
model are mapped into the SMCS test procedure to reflect the random behaviour of the
system generation and demand. The probability and the frequency of each SIPS
operational state obtained from the reliability assessment are mapped into the model to
represent different scheme operational behaviours, e.g. normal operation, DBM and
SBM.
As shown in Figure 6-3, the scheme is triggered by an initiating event, which is
normally a critical circuit outage or overloading, and the GRS is designed to prevent
cascading tripping. The probability of a line outage event (Ei) is usually approximated
as an exponential distribution. Once the line outage rate (λline) is known, the probability
of the occurrence of the initiating event in the next hour can be obtained as:
Pr( ) 1 t
iE e (6-1)
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 168
Start
End
SIPS Reliability DataPr(SO), Pr(DBM), Pr(SBM)
Wind Generation ProfileLoad Profile
Triggering Event?
SIPS Operation?
DC Optimal Power Flow(OPF)
Converge?Impact Assessment
(i.e. VOLL, LOWG, Energy Redispatch etc. )
Network & SIPS Status Mapping (at time t)
Load Shedding
SIPS Operation?
Stop
Success Operation
(t=t+Δt)
Scheme
Yes
Yes
No
No
Yes
YesNo
Plant
Yes
No No
SIPS SBMSIPS DBM
Risk AssessmentRisk(SO)=Pr(SO)×Im(SO)
Risk(DBM)=Pr(DBM)×Im(DBM)
Risk(SBM)=Pr(SBM)×Im(SBM)Pr(DBM) Pr(SBM)Pr(SO)
Pr(E)
SMCS Stop Criteria
Figure 6-3: Risk Assessment Procedure using SMCS
A set of different system conditions with different weather and load levels is produced
by the year-round risk assessment. The impact of each SIPS operational state under a
certain system condition is:
1 1
1 1( , ) ( ( ) ( ) ( , ))
N N i i i
i i L i G i R i ii iImpact Im g d C d C g C g d
N N (6-2)
where N is the number of the samples within a year-round simulation and ( , )i iIm g d
computes the impact of the GRS maloperation as a function of the generation output
and the load level. The parameters ( )i
L iC d , ( )i
G iC g and ( , )i
R i iC g d represent the cost
associated with load shedding, wind curtailment and generating capacity redispatch.
In each of the operating hours during a year period, the Value of Lost Load (VOLL), the
Loss of Wind Generation (LOWG) and the capacity of dispatched generation from other
parts of the system (LOWG-VOLL), incurred after each particular SIPS operation, are
calculated. The optimal power flow is performed based on the system conditions at the
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 169
time i. These are then used to calculate the three parameters using the cost figures given
in Table 5-4:
( ) ( ) 18000i
L iC d VOLL i t (6-3)
( ) ( ) 120 50i
G iC g LOWG i t N (6-4)
( , ) ( ( ) ( )) 50i
R i iC g d LOWG i VOLL i t (6-5)
Risks introduced by a GRS operation can be calculated by multiplying the probability of
a particular operational state with its corresponding impact:
1( ) ( ( ) ( ) ( , ) ( ) ( ))
N
i i i i i i iiRisk S Pr E Pr S Im g d Pr g Pr d
(6-6)
where N is the number of the samples within a year-round simulation, and the
parameters ( )iPr E , ( )iPr S , ( )iPr g and ( )iPr d represent the probabilities of the
initiating event Ei, the state Si, the generating output gi and the load level di.
The SMCS is a fluctuating convergence process. The estimated indices will approach
their “real” value as the simulation proceeds. The expectation of the risk (E(Risk)) in N
sampling hours can be estimated using the following equation:
1( )
N
iiRisk
E RiskN
(6-7)
The variance of the estimated risk can be obtained by:
2 2
1
1[ ( )]
( 1)
N
i
i
Risk E RiskN N
(6-8)
where Riski denotes the sample value of the risk in the hour i.
The simulation should be terminated when the estimated reliability indices reach a
specified degree of confidence to achieve a compromise between accuracy and
computation effort. The coefficient of variation (α) is often used as the convergence
criterion in SMCS and is defined as:
/ ( )E Risk (6-9)
The number of samples required by the SMCS can be determined by the two stopping
rules [59, 92]: The first approach is to use a sequential stopping procedure and to let the
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 170
SMCS run until the coefficient of variation (α) reaches the predefined tolerance value.
The second approach is to run a given number of samples and then check if the
coefficient of variation is acceptable. If not, the number of samples can then be
increased. In this simulation process, the SMCS stops when the coefficient of variation
(α) is less than 1%.
6.3. System Condition Time-series Model
6.3.1. Wind Forecast Model
Due to the intermittent power output of a wind farm, a wind speed forecast model is
required to accurately predict the wind output from the wind farms. The auto-regressive
and moving averages (ARMA) model, which is an accurate wind speed forecasting
technique, is used to represent the time-series feature and the probabilistic
characteristics of wind speed. The historical hourly wind speed data of an off-shore
wind farm in northwest England over the period 1980 to 2010 [93] was used as the data
base to predict the future wind speed. The speed data is then converted into the wind
generation profile using a wind turbine generator (WTG) model.
Historical Wind Speed Data at Wind
Farms
Auto Regression and Moving
Average Model
(ARMA)
Predicted Wind Speed
Data
Wind Turbine Model
Wind Power Output
Figure 6-4: Procedures to Produce Times-Series Wind Farm Output Data
The ARMA model [94] first standardizes the historical wind speed (WS) samples at
each location as:
( ) /t t t ty WS (6-10)
where t and t are the historical mean wind speed and standard deviation respectively.
The time sequential data series set yt is then used to establish the wind speed time series
model:
1 1 2 2 1 1 2 2t t t n t n t t t m t my y y y (6-11)
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 171
where i and j are the autoregressive and moving average parameters of the ARMA
model. t is the normal white noise process with zero mean and a variance of 2
a (i.e.
2(0, )t aNID ), where NID denotes normally independently distribution. The
simulated wind speed at a particular location and time are expected as:
t t t tV y (6-12)
where tV is the simulated wind speed at t based on the historical mean wind speed t
and the standard deviation t .
Finally, the wind output characteristics of a 3MW Vestas V90 wind turbine [95] are
used to translate the predicted wind speed data into the generation output. The nonlinear
relationship between wind speed and the output of the wind turbine is described in
Equation (6-13):
2
0 0
0
t ci
t t r ci t r
t
r r t co
co t
V V
A B V C V P V V VP
P V V V
V V
(6-13)
where, Vci, Vr and Vco represent the cut-in wind velocity, rated wind speed and cut-off
wind velocity respectively; Pr represents the rated power of the wind turbine, and A, B
and C are the parameter of the wind turbine output characteristic curve. For this study,
the cut-in, rated and cut-out speeds of the selected wind turbine are 3.5, 15 and 25 m/s
respectively. Hourly wind speeds were repeatedly simulated for 100 yearly samples and
are then mapped into the risk assessment model. Figure 6-5 shows the probability
distributions of raw wind speed data recorded in 31 years and the characteristic of the
WTG model. The mean wind speed and the
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 172
Figure 6-5: Wind Speed Data Distribution and Wind Turbine Model
6.3.2. Power System Load Profile
Electric load profile forecasting is the most important tasks in Power System operation.
In future, due to the growth in weather-dependent distributed generation, the British
transmission demand becomes more variable. The SIPS operation therefore has to be
more flexible and smart in order to accommodate the variation in transmission demand.
Consequently, the test model must integrate the dynamic transmission demand profile to
provide a more precise simulation on how much generation capability is needed and
how often it is required. The IEEE-RTS load model [85], a widely-used system for
Power System Reliability Evaluation, was used to forecast the load variation in a year.
A profile of hourly peak load during a calendar year is created in the load profile with
the detailed data shown in Appendix C.
Figure 6-6: IEEE RTS Yearly Load Profile
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 173
6.4. Numerical Illustration of Stochastic SIPS Risk Assessment
In this section, the risk assessment process based on the stochastic method is applied to
an event-based GRS implemented in a multi-unit plant of a more complex IEEE-24 bus
Reliability Test System [85].
Three modifications were made to the original test system to make it more stressed. The
initial system conditions used for GRS risk assessment are:
The load level at each load point is increased by 40%.
3×300MW wind farms are connected to B13.
The load at B13 is removed.
In the modified IEEE 24 bus system, L18 and L20 are two critical circuits connecting
the integrated wind farms to the customer side 138kV system in the low half of Figure
6-7. Due to the integration of the wind farms, these two circuits would be heavily
loaded when the power output at the wind farm and the load level are high. An outage
of either of the critical circuit (i.e. L18, L20) could result in a cascade tripping of the
other circuits interconnecting the power plant and the rest of the system. This can lead
to the outage of the entire wind farm and load shedding at some load points especially
when the load level is high. This situation can be alleviated by disconnecting one of the
three wind farms when a critical line outage is detected.
The GRS, depending on its communication network, could collect the monitoring data
form either local substations or from substations all over the network and use it for
decision making. A system-wide SIPS, which is considered to be the future trend [28],
is implemented in the test system. Unlike the local schemes, system-wide SIPS could
collect measurements from all the substations in the network and use these in a
centralised decision making process. For the studied GRS, the controller uses the
breaker status data of L18 and L20 as an activation signal. In addition, the generator
output is used as arming signal to initiate the scheme. The status of the circuit breakers
at both of the critical lines can be acquired from the local substation B13 or from the
substations B11 and B12 at the remote end of the critical transmission lines. Meanwhile,
the power output of the power plant at B13 is continuously monitored by the power
meters. The GRS is armed only when the power output is higher than a certain level.
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 174
Figure 6-7: IEEE 24-Bus Reliability Test System with GRS Logic
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 175
Figure 6-8 shows the method used to determine the arming point of the GRS by
comparing the system risks without GRS and the risks when a reliable GRS is
implemented. When the total generation output of the wind farm is below 560 MW, the
system meets the ‘N-1’ criterion. Consequently, outage of one critical circuit won’t
bring any risk to the system. However, when the generation level is beyond 560 MW,
outage of a critical circuit will cause cascading tripping and lead to increased risk.
When GRS is implemented, the risks can be controlled can maintained at a relatively
low level. However, when the generation level is lower than 570 MW, implementation
of GRS causes more frequent wind rejection. Consequently, the most economical
strategy is to arm the GRS when the generation level is above 570 MW.
Figure 6-8: Comparion between System Risks with and without GRS
6.5. Stochastic Risk Assessment Results
The historical hourly wind speed data of an off-shore wind farm in northwest England
over the period 1980 to 2010 [93] was used as the data base to predict the future wind
speed. Hourly wind speeds are repeatedly simulated by the ARMA model to obtain a
large number of yearly wind speed samples for the wind farm. Figure 6-9 shows the
probability density distributions of the historical wind data and the simulated wind data
via ARMA model. It can be seen that the probability distribution of the wind speed is
close to normal distribution. The mean wind speed value (μ) and the standard deviation
(σ) of the simulated data and recorded data are extremely close.
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 176
Figure 6-9: Simulated and Histroical Wind Speed Data Probability Density Function
The simulated wind speed data are then mapped into the SMCS model illustrated in
Figure 6-3. The coefficient of variation (α) is used as the stopping criterion of the
SMCS. As indicated in Figure 6-10, after 876,000 sample hours of simulation (i.e. 100
hours), the coefficient of variation of the SIPS risk reaches within 1%, which is used as
the error tolerance in this case. Consequently, the SMCS stops and the expectations in
the risk of each SIPS design after 100 years’ simulation period are used for further study.
Figure 6-10: Coefficient of Variation in SIPS Risk with Simulation Hours
The expected annual risks induced by GRS with different tripping logic and
communication architectures are shown in Figure 6-11 and Table 6-1. It can be seen that
the scheme with full redundancy (i.e. Arch4) and vetoing tripping logic delivers the
optimal overall performance with an overall risk of 30376 $/year. More specifically, the
annual risk from scheme normal operation stays at approximately $9000 for all the
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 177
designs. Consequently, this is not the main factor affecting the decision making in SIPS
design and it is mainly affected by the operating frequency of the GRS. Risk from GRS
DBM is the main contribution to the total cost for most designs, since the impact of
GRS DBM (e.g. isolation of the entire wind farm, load shedding) is considerably larger
than that of the normal operation and SBM.
Different SIPS communication infrastructures also lead to significant variations in
annual risks from scheme DBM and SBM. Similar conclusions can be drawn in terms of
the selection of the sensor network architectures compared with the analytical analysis
in the previous chapter. The implementation of duplicated bay level process bus system
as compared to single (e.g. Arch2 versus Arch1) and duplicated PRP based LANs
versus single (e.g. Arch3 vs Arch1) can significantly improve the performance in
dependability, with Risk(DBM) decreasing from approximately 43000 $/year to 30500
$/year. However, the redundancy in the communication system also leads to increased
security risk. For example, implementing duplicated LAN (Arch3) on Arch1 (Voting)
will increase the annual Risk(SBM) from 16682 $/year to $ 20291 $/year. The use of
vetoing logic can effectively decrease the risk of SBM without significantly
compromising the performance in dependability. For example, the risk of SBM for
Arch4 (Vetoing) is 11826 $/year as compared to 24135 $/year for Arch 4 (Voting),
whilst the risk of DBM for Arch4 (Vetoing) is 18551 $/year as compared to 16638
$/year for Arch4 (Voting).
Figure 6-11: Annual Risks Induced by Different GRS Designs
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 178
Table 6-1: Risk Assessment Results for Different GRS Designs
Comm. Architecture GRS Risks ($/year)
Arch. Process Bus Station
Bus
Operation
Cost
Risk
(DBM)
Risk
(SBM)
Risk
(total)
Arch1 (voting) Single
Process Bus Single
Ring
LAN
8945 42865 16682 59547
Arch1 (vetoing) 8937 43872 12060 55932
Arch2 (voting) Duplicated
Process Bus
9041 29283 20526 49809
Arch2 (vetoing) 9022 31956 8217 40174
Arch3 (voting) Single
Process Bus PRP
Ring
LAN
9035 30103 20291 50394
Arch3 (vetoing) 9030 30824 15669 46493
Arch4 (voting) Duplicated
Process Bus
9130 16638 24135 40773
Arch4 (vetoing) 9117 18551 11826 30376
6.6. Comparison between Local GRS and System Wide GRS
Based on the scheme communication architecture, SIPS application can be classified
into local SIPS and system wide SIPS. Most of the existing SIPS are local [17]. This
means the sensing, decision making and control devices are all allocated within the
same substation. The standalone nature of the local SIPS makes it difficult to achieved
coordination between different SIPS and may lead to an extensive maintenance effort if
the number of SIPS being implemented in the system increases.
Risks introduced by a system-wide centralised GRS (C-GRS) as shown in this studied
system is compared with a local GRS in this section. The aim is to evaluate whether a
system wide GRS could provide an enhanced or reduced reliability performance
compared with the local GRS. In Figure 6-12, the risks of the optimal local GRS design
(Arch4, voting) is compared with the optimal C-GRS design (Arch4, vetoing). The
failure rate of the WAN is varied over a wide range to observe its impact on scheme
risks. The local GRS is activated by the line outage information collected from the line
outage detection system monitoring the status of CB1 and CB2 at substation B13. With
all the sensing, decision making and implementation devices of a local GRS installed in
a single substation, its risks as represented in the dash lines, are not affected by the
variation in the WAN’s reliability. However, Risk(DBM) and Risk(SBM) of the C-GRS
both increase significantly with the growth in WAN failure rate. For example, if we use
the assumed failure rate of WAN as a base value, the risk of a C-GRS is lower than the
risk of a local GRS when the failure rate of WAN is less than 2.8 times of the base
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 179
value. In addition, the security risks of the C-GRS is higher than the local GRS when
the failure rate of the WAN is higher than 0.59 of the base value. However, a better
overall risk is achieved under most conditions by the C-GRS because of its enhanced
performance in terms of dependability.
The assessment results indicate that a centralised SIPS could achieve equal or better
performance as compared with the local schemes given two preconditions: firstly, the
reliability of the WAN should be high since it directly affects the performance of the C-
SIPS. Secondly, with more sources of information, more tripping options can be
designed for the C-SIPS. Therefore, suitable tripping logic is needed to balance
dependability with security to achieve optimal overall performance. This result should
encourage utilities to centralise the existing standalone SIPS to achieve enhanced
performance and SIPS coordination.
Figure 6-12: Comparison between a Local GRS and a System-Wide GRS
6.7. Impact of Variation in Wind Level on Risk Assessment Results
As illustrated in Chapter 5, the impact of uncertainty in the reliability data used in the
risk evaluation results can be effectively assessed via sensitivity study. The reliability of
the components, the frequency of the triggering event and the system conditions are
considered as main factors affecting the performance of the protection scheme. For a
GRS implemented at a wind farm, the output from a wind farm is a factor which affects
its frequency of operation and operational risks. The wind level at one location varies
significantly throughout a year. Figure 6-13 illustrates the variation in monthly average
wind speed across 100 years based on the wind data predicted by ARMA model.
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 180
In UK, the wind level peaks at winter time, whilst the minimum wind level is in
summer. In particular, the monthly wind speed in January is the highest across a year,
averaging 11.45 m/sec. The highest average monthly output from the wind farm reaches
58.7% of its capacity. Nevertheless, the wind speed averages the lowest during June,
with an average of 7.41 m/sec. The lowest monthly wind output observed in June is
15.3% of the total capacity of the wind farm. Consequently, the monthly power output
from a wind farm observed over 100 years simulation period varies from 15.3% to 58.7%
of its capacity, with an average output being 32.1%. Although most of the time the
output is close to the average value, the risk of GRS at extreme conditions need to be
examined.
Figure 6-13: Monthly Average Wind Speed Variation over 100 years
The variation in the monthly average risk of a GRS under each wind scenario (i.e. Low,
average and high) is illustrated in Figure 6-14. It can be seen the operational costs
associated with both normal GRS operation and GRS maloperations (i.e. DBM and
SBM) sources increase with the wind output. The operational risk is expected to
increase significantly during the winter time, with a high wind level and load demand.
In addition, in this case, the selection of the optimal GRS design is not affected by the
variation in wind level. The risk induced by Arch4 (vetoing) remains the lowest as
compared with other designs.
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 181
Figure 6-14: GRS Risks under Various Average Monthly Wind Levels
6.8. Summary
This chapter first provides an overview of the future GB energy landscape by reviewing
the latest documentation from UK National Grid. The expected significant increase in
intermittent renewables makes the system less reliable, and in future operation will rely
more on SIPS. To understand the implication of this decision, a more dynamic and
accurate SIPS risk assessment method is required.
Knowing the limitation of the analytical risk assessment method introduced in Chapter
5, a method based on Sequential Monte Carlo Simulation (SMCS) is proposed in this
chapter and applied to the evaluation of the reliability of a Generator Rejection Scheme
implemented in a system with a high penetration of wind generation. An ARMA model
based wind output prediction model was developed to forecast the generation output of
the wind farms based on historical wind speed data. The time-series generation and
demand models are then mapped into the risk assessment procedure and used to assess
the performance of GRS under various system conditions.
The study helps determine the optimal arming criteria by comparing the system risk
with and without SIPS implementation. The risks of GRS with different communication
architectures are assessed to determine the optimal design. The design in both process
bus communication system, bay sensor network and station bus communication system
will affect the performance of the GRS and affect the trade-off between scheme
dependability and security. The required trials for the SMCS based method can be
Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration
Page | 182
determined by monitoring the variation in SIPS risks. The fluctuations in the generation
output from a wind farm leads to a large variation in SIPS operational risks. It affects
the costs associated with normal operation, DBM as well as SBM. Therefore, a precise
wind level prediction and a dynamic SIPS risk assessment method are critical to
effectively forecast and manage the system operational risk.
The operational performance of a system wide GRS is compared with a local GRS.
Enhanced performance can be provided by the system wide GRS given a relatively high
reliability of the wide area communication system. The access to wide area information
could significantly assist the estimation of system condition and bring more flexibility
in the SIPS logic processing design. The increased probability of SBM brought by the
increased detecting systems implemented for a GRS could be effectively controlled by
using a ‘vetoing’ tripping logic to validate the scheme operation. The introduction of
wide area communication network into the system could also facilitate condition
monitoring and the coordination of the protection schemes, especially in a system with
high level wind penetration. This will be further illustrated in Chapter 7.
In the near future, significant amounts of renewable generation will need to be
integrated into the system, which means the system condition will become more
difficult to predict. The variations in system condition also make it difficult for one
protection scheme logic to be suitable for all the system operational conditions.
Therefore, an adaptive SIPS, which changes its operational logic, based on wide area
real-time data, needs to be designed to better manage the risk brought by a SIPS
implementation.
Page | 183
Chapter 7
MANAGING THE RISK OF SIPS IN
POWER SYSTEM LONG-TERM PLANNING
7.1. Introduction of electric system planning with SIPS
An increasing number of SIPS are now being implemented at different locations on a
power network and are being used for various control actions. In particular, as the
exploitation of the wind power has expanded quickly in the period 2000-2016 due to
improved and less expensive wind turbine technology, increased fossil fuel prices,
government subsidies and other policy incentives; greater use of SIPS has become
increasingly attractive in long-term system planning. This is mainly because of its
relatively low cost compared with transmission expansion. Consequently, the
widespread proliferation of SIPS has resulted in increased operational complexity and a
higher probability of unintended SIPS interactions. This significantly increases the risks
to the Power System brought by SIPS.
This chapter focuses on assessing the operational risk of using SIPS, when considering
the challenges SIPS may face in the long-term future. The possible unintended SIPS
interactions, caused by the growth in the number of SIPS are investigated. Meanwhile, a
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 184
computational model to address the problem of a SIPS-aided transmission expansion
plan is proposed.
7.1.1. Electric system long-term planning with SIPS
Power System planning involves the systematic assembly and analysis of the new
facilities and equipment (e.g. generation, transmission, distribution) needed to replace
worn-out systems and ones required to satisfy changing electricity demand. Planning
methodologies have been developed for energy supply, transport, and demand that
present the information to decision-makers to choose the appropriate course of actions.
An improved transmission network that enables more efficient power transfer within or
between regions will be described. The benefits of building long-term transmission
facilities include:
Greater access to low cost power.
Reducing the transmission congestion costs.
Reducing line losses, observed as heat, during electricity transmission.
Improving the electricity import and export ability of a region and new built
flexibility in absorbing new resource allocations.
Composite Power System expansion planning is usually developed by a combination of
reliability and economic justifications. Traditional transmission expansion is reliability
driven and this needs to include the generation, transmission adequacy assessment
which should account for the uncertainties in generation, transmission network and load.
The economic factors that need to be considered in transmission planning include:
production cost, investment cost, congestion charges and system interruption cost. In
addition, in the long-term system planning, inflation also needs to be considered as the
cost of materials for building transmission lines and SIPS increases with time.
Due to the rapid growth in wind generation and a relatively slow transmission
expansion rate, utilities may need protection schemes like SIPS to trip non-priority
generations to alleviate the congestion and allow greater access to low-cost power. Thus,
SIPS used for generator rejection could postpone the need for a new transmission
facility and may affect the transmission expansion planning. Nevertheless, as described
in the previous chapters, introduction of SIPS also brings additional risks which need to
be assessed. Therefore, the impact of SIPS on transmission planning decisions and the
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 185
induced additional risks needs to be predicted, assessed and considered in the planning
framework.
7.1.2. Challenges in SIPS Coordination
A main contributing factor to the challenges in SIPS coordination is due to the high
penetration of SIPS in Power System. This significantly increases the complexity of
system operation and may lead to a higher probability of undesired interactions between
the SIPS. The maloperation of individual SIPS (i.e. Dependability-based Maloperation
or Security based Maloperation) is another factor which may contribute to the spreading
of an electrical disturbance and eventually trigger the operation of other SIPS. Both the
Irish incident on 5th August 2005 [10] and the Nordic event on 1st December 2005 [11]
as mentioned in Chapter 2 are caused by the interaction between neighbouring and
overlapping schemes.
In addition, limitations in the extent of system studies or incomplete studies that do not
fully analyse the inter-relationship between a newly implemented SIPS and the existing
schemes may also lead to a SIPS interaction during normal operation. For example, an
unwanted operation of a generator rejection scheme will cause a reduction in system
frequency. This might accidentally trigger under-frequency load shedding.
Consequently, system studies need to be performed to ensure that the frequency dip
caused by the GRS would not trigger the scheme designed for load shedding to avoid
the undesirable interaction. Although SIPS are highly reliable, the rapid growth in their
use and the catastrophic impact of a maloperation highlighted the need to include the
risks associated with SIPS interactions in the SIPS risk assessment procedure.
A brief discussion of the increased operational complexity brought by the high
penetration of SIPS is provided by McCalley in [96]. The conceptual relationship
between the number of SIPS and system operational risks is illustrated as shown in
Figure 7-1. The impact of different strategies (i.e. SIPS, Transmission Upgrade) on the
risk in system operation is compared. With the increase in load level as indicated in
dotted lines, the system becomes more stressed. Consequently, with no actions taken,
the system operational risk reaches the limit when the demand is beyond 80 GW at
point A. The implementation of SIPS could effectively reduce the operational risk,
which could be controlled within in the acceptable risk limit for loading levels within
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 186
100 GW (i.e. point B). However, without transmission upgrade, the system operational
risk would eventually reach the acceptable limit due to increased risk associated with
SIPS maloperations and undesirable interactions. By transmission upgrading, the risks
can be controlled within the limit even when the load level is beyond 100 GW.
However, it is also an expensive solution and may not be practical in many cases.
Therefore, by combining the SIPS and transmission upgrading, a secure and cost-
effective approach can be carried out as shown in the red line. This helps avoid the
continuously increasing system operational complexity due to SIPS application. In
addition, the build of new transmission facilities can be postponed. Although Figure 7-1
clearly reflects the conceptual relationship between system operational risk and various
system expansion strategies, a quantitative study taking account the impact of SIPS
interactions and generation expansion plan is required. This is proposed in this chapter.
Figure 7-1: Conceptual Relationship between SIPS Number and System Operational
Risks [96]
SIPS operational complexity was recommended to be considered in SIPS reliability
assessment [97]. The study illustrated how to use multi-stage tree approach and
analytical process to enumerate the transmission planning options and minimizing
operational complexity. However, a quantitative method to evaluate risk caused by SIPS
interactions was not provided. As required in [14], SIPS should be designed to have
dedicated protection relays and communication system to prevent interactions with
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 187
other system. However, with the increase in number of SIPS and the centralised control,
interactions caused by common mode failures become inevitable. As reviewed in
Chapter 5, most of the existing SIPS risk assessment method focused on the risk
assessment of a single SIPS. These become insufficient to assess the risk of system with
multiple SIPS. Hence, in this chapter, a method based on multi-level Markov Modelling
is proposed to consider all the possible interaction states between SIPS in neighbouring
system. This helps identify the worst system condition which could be caused by SIPS
cascading failures and help adjust SIPS logics to achieve optimal coordination between
different SIPS.
7.2. Risk Assessment Methodologies Considering SIPS Interaction
A brief risk assessment procedure in a SIPS-rich system is illustrated in Figure 7-2.
Compared with the original process, few modifications have to be included in the risk
assessment procedures: First, a system study is carried out to identify all the SIPS in the
system and their operational logic, communication infrastructures and possible failure
modes. Next, the reliability assessment is modified by including a system-level Markov
Model to determine the probability and frequency of being in various SIPS interaction
scenarios. Finally, the impact assessment has to consider the impact of SIPS cascading
events, which may lead to a severe level of overall system risk.
Identify all the SIPS in system and their operational logic
Component-level & System-level Markov Model
Identify SIPS interaction scenarios
Predict Operating Conditions (Wind Generation & Load)
Sequential Monte Carlo Simulation
Impact and Risk Assessment
System Upgrade/Generation Integration
Figure 7-2: Risk Assessment Procedure Considering SIPS Interactions
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 188
7.2.1. Description of the System-level Multi-state Markov Model
The reliability assessment process is first modified to include a system-level Markov
Model, designed to effectively determine all the possible interactions between the SIPS
and assess the transitions between these states. Once the individual failure modes of
each SIPS are obtained using component-level reliability assessment as described in
Section 5.4.2., the interactions between SIPS are assessed using a system-level Markov
Model. A single SIPS can be in three operational states (i.e. Normal state, DBM and
SBM), which means the total number of operational states in the system will be 3N,
where N is the number of SIPS in the system.
1
SIPS-1: N
SIPS-2: N
2
SIPS-1: D
SIPS-2: N
3
SIPS-1: N
SIPS-2: D
4
SIPS-1: N
SIPS-2: S
5
SIPS-1: S
SIPS-2: N
6
SIPS-1: D
SIPS-2: D
7
SIPS-1: D
SIPS-2: S
8
SIPS-1: S
SIPS-2: D
9
SIPS-1: S
SIPS-2: S
λ1d
μ1d
λ2d
μ2d μ2s
λ2s
μ1s
λ1s
λ1sλ2s
λ1d
λ2dλ1d λ1s
μ1d μ2d μ1d μ2d
λ2d
μ1s μ1s
λ2s
μ2s μ2s
Level 0
Level 1
Level 2λcd λcs
μcs μcd
Figure 7-3: System-level Markov Model to Assess Interaction between Two SIPS
A system-level Markov Model used to assess a system consisting of two schemes (i.e.
SIPS-1 and SIPS-2) is shown in Figure 7-3. A total number of 9 operational states (32=9)
are considered in the model. State 1 (Level-0) is the most common and ideal operational
state, since both SIPSs are in the normal operating mode (N) and contingencies at
different locations in the system can be effectively mitigated by the corresponding SIPS.
States 2-5 (Level 1) are the most common cases for SIPS maloperation, with only one
scheme maloperated while the other is in the normal operational mode. The level-1
states do not usually result in any cascading failures because the system is normally
designed to withstand the outage of any system component and the other SIPS can be
used as backup protection to prevent the spreading of the failure. However, under severe
situations, the initial scheme failure may change the power flow and generation output.
In which case, the operation of the other scheme may be inappropriate under the new
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 189
system conditions. States 6 to 9 (Level-2) are the states with the most severe
consequences on system operation, with both SIPS in failure state (either DBM or
SBM). Maloperation of one scheme may lead to the disturbance of other parts of the
system and may require the operation of the other scheme. If the other SIPS is also in its
failure mode, serious cascading consequences could be caused by Level 2 operational
states.
The failure rates (λd, λs, λcd) and the repair (μd, μs, μcs) rates in the Markov model are
driving the transitions from one state to another. Although a SIPS needs to be designed
independent of the other schemes, common mode failures may still exist between
different SIPS due to inadequate design or due to a high level of centralisation in the
SIPS communication system. The common mode failures (λcd, λcs), which lead to
simultaneous failure of both schemes, can directly cause transition from the Level 0
state to a Level 2 state. For example, a dependability failure of the WAN could cause
DBM for both SIPS in the system. It is assumed that a common mode component
failure will have the same impact on both SIPS. Therefore, the transition between State
1 to State 7 or State 8 is neglected in the Markov Model.
To use a simplified system-level Markov Model and to represent the operational states
in a system with multiple schemes (more than two), it is assumed all the SIPS have
identical failure and repair rates (i.e. λ1d=λ2d, μ1d=μ2d). This allows the 9-state Markov
Model to be simplified into a 6-state Markov Model as depicted in Figure 7-4.
S1
All SIPS: N
S2
1 SIPS: D
S3
1 SIPS: S
S4
2 SIPS: D
S5
1 SIPS: D
1 SIPS: S
S6
2 SIPS: S
M31 M21
ᴧ12ᴧ13
ᴧ24
M42
ᴧ36
M63
ᴧ35ᴧ25
M53 M52
Level 0
Level 1
Level 2
ᴧ14 ᴧ16
M41 M61
Figure 7-4: Simplified System-level Markov Model for a System with Multiple SIPS
The stochastic transitional probability matrix of the simplified 6-state Markov Model is
given:
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 190
12 13 14 16 13 1612 14
21 24 25 21 24 25
31 35 36 31 3635
41 42 41 42
5352 52 53
61 63 61 63
1 ( ) 0
0 01 ( )
1 ( )0 0
0 01 ( ) 0
0 00 1 ( )
1 ( )0 0 0
Pr
M M
M M
M M M M
MM M M
M M M M
(7-1)
Assuming there are N SIPSs in the system, the equivalent transition rate of the
simplified Markov Model can be approximated by summing the failure rates and
averaging the repair rates which lead to the new operational states:
Failure rates: Repair rates:
12 dN 21 dM
13 sN 31 sM
2
14 N cdC 41 cdM
2
16 N csC 61 csM
(7-2)
24 ( 1) dN 42 2 dM
25 ( 1) sN 52 sM
35 ( 1) dN 53 dM
36 ( 1) sN 63 2 sM
The probabilities of being in each operational state in the Markov Model after m time
intervals Pr(m) and the frequency of encountering each state ( )f s can be calculated as
follows:
( ) (0)m mPr Pr Pr (7-3)
( ) ( ) ( ) ( ) ( )d ef S Pr S S Pr S S (7-4)
12 13 14 16 21 31
41 61
( 1) ( 1) ( ) ( 1) ( 3)
( 4) ( 6)
f S Pr S Pr S M Pr S M
Pr S M Pr S M
(7-5)
24 25 21 12 42 52( 2) ( 2) ( ) ( 1) ( 4) ( 5)f S Pr S M Pr S Pr S M Pr S M (7-6)
35 36 31 13 53 63( 3) ( 3) ( ) ( 1) ( 5) ( 6)f S Pr S M Pr S Pr S M Pr S M (7-7)
41 42 14 24( 4) ( 4) ( ) ( 1) ( 2)f S Pr S M M Pr S Pr S (7-8)
52 53 25 35( 5) ( 5) ( ) ( 2) ( 3)f S Pr S M M Pr S Pr S (7-9)
61 63 16 36( 6) ( 6) ( ) ( 1) ( 3)f S Pr S M M Pr S Pr S (7-10)
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 191
where ( )Pr S and Pr( S )are the probabilities of being and not being in the operational
state S. ( )d S represents the rate of departure from the state S and ( )e S represents the
rate of encounter the state S. The reliability assessment results are then integrated in the
risk assessment procedure to evaluate the overall operational risks including SIPS
maloperation and undesirable interaction.
7.2.2. Modified Impact Assessment Procedure
The modified impact assessment is performed in two steps. First, the impact of each
SIPS operation after a triggering event is evaluated. The changes in power flow and
generation output brought by the SIPS’s remedial actions or maloperations are updated
in the testing system. Next, the impact of the initial SIPS operation or maloperation on
the other SIPS in the system is investigated. A system study is carried out to investigate
whether the initial scheme would trigger the operation of other schemes in the system,
and therefore cause an additional impact on system reliability.
Due to the complexity in the SIPS operational conditions, the Sequential Monte Carlo
Simulation (SMCS) is used to assess the impact of each SIPS operational state under
various system conditions. A set of different system operating conditions are predicted
based on both historical data and the simulation models. Extreme scenarios such as
severe weather conditions or high load demand have to be included and emphasized
since it may lead to cascade SIPS operation. A dynamic annual hourly load profile
based on IEEE-RTS load model and wind farm output profiles created using the ARMA
model are integrated into SMCS in order to reflect the random behaviour of the system.
The impact of each of the 9 operational states is evaluated using various impact indices
and can be expressed as:
( , ) ( ) ( ) ( , ) ( , )i i i i
i i L i G i R i i S i iIm g d C d C g C g d C g d (7-11)
where ( , )i iIm g d computes the impact of different interactions as a function of
generation output and load demand, and the Parameters ( )i
L iC d , ( )i
G iC g , ( , )i
R i iC g d and
( , )i
S i iC g d are the cost associated with load shedding, wind curtailment, generating
capacity redispatch and restart of tripped generators or wind farms respectively.
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 192
The risk introduced to the system is calculated as the product of the probability or
frequency of each SIPS interaction state and its impact on the system. The risk of DBM
related operational states ( )DBMRisk S and the risks of a SBM states ( )SBMRisk S can be
calculated as:
1( ) ( ( ) ( ) ( , ) ( ) ( ))
N
DBM i i i i iiRisk S Pr E Pr S Im g d Pr g Pr d
(7-12)
1( ) ( ( ) ( , ) ( ) ( ))
N
SBM i i i iiRisk S Fr S Im g d Pr g Pr d
(7-13)
where N is the number of samples within the period of simulation. Parameters ( )iPr E ,
( )Pr S , ( )iPr g and ( )iPr d represent the probabilities of the initiating event Ei, the
state S, the generating output gi and the load demand di. Fr(S) is the frequency of
encountering the SBM states.
7.3. Method Numerical Illustration
To evaluate the effectiveness of the risk assessment method, the PJM 5-bus system [98]
was used to illustrate the impact of SIPS maloperations and undesirable interactions on
system integrity. All the transmission lines in the system are assumed to have an
identical thermal rating of 400 MVA. The cost and MW limit of each generation are
illustrated in Figure 7-5. Two wind farms were initially integrated at bus B1 and B5,
with an installed capacity of 100 MW each. Due to the low cost, the use of wind
generation is given exclusive priority in the system. The power is transferred from the
generation centre with a relatively low cost to the load centre. However, the installation
of the wind farms stressed the connecting transmission lines, especially when the wind
farm outputs or the load levels are high. L1, L2 and L6 are the three heavily loaded lines
connecting the main generations to the load.
The ‘N-1’ criterion may not be satisfied during stressed system conditions. When there
is a permanent fault on these critical circuits, the associated protection device will trip
the circuit breaker to clear the fault on the line. An outage on any one of the three
critical lines could lead to power flow on the other two lines exceeding 400MVA and
therefore initiate cascading trips, which would isolate the generation centre (B1 and B5)
from the load centre (B2, B3 and B4). This may eventually lead to a higher generation
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 193
cost by the use of G2 and G3. When load demand is high, disconnection of customers at
B3, B3 and B4 might be required. However, instead of the upgrading of the
transmission network (i.e. a new line), the SIPS could efficiently maximize the power
transfer to the load centre whilst maintaining system reliability.
300MW
300MW
300MW
400MW15 $/MWh
300MW
B1 B2
B3
B4B5
WF1: 100MW
WF2: 100MW
400MW30 $/MWh
S
L1
L2L3
L4
L6
L5
Generation Center Load Center
G4
G1G2
200MW40$/MWh
G3
13 $/MWh
Figure 7-5: Modified PJM 5-bus System with Wind Farms
SIPS-1 and SIPS-2 could be implemented at WF1 and WF2 respectively to enhance the
system integrity. For high wind speed at WF1 and consequently high wind output,
SIPS-1 is armed to continuously monitor the status of L1 and L2. Under stressed system
conditions, when there is an outage on either of the two lines, the other line will be
overloaded and there is a possibility of cascade tripping. In this case, SIPS-1 is designed
to disconnect the WF1 from the system to relieve the overloaded lines and prevent
cascade tripping and the isolation of all the generation plants at B1. When the output of
WF1 is low and the output at WF2 is high, the operation of SIPS-1 may not be
sufficient to relieve the overloading on L1 or L2. Consequently, the operation of SIPS-2
is required to disconnect WF2 from service as a backup protection. The initiating event
for SIPS-2 is the outage of L3 or L6. Following the outage of L6, and if we assume a
high load demand and a low wind speed at WF2; then L1 and L2 could be heavily
loaded, even after the operation of SIPS-2. Therefore, the operation of SIPS-1 may be
required following the operation of SIPS-2. In this system, the operation of one SIPS
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 194
affects the other schemes, therefore the interaction scenarios need to be studied. A case
study, involving different SIPS designs, was used to evaluate the performance of a
SIPS-rich system.
7.3.1. Reliability Assessment Results
Knowing the operational logic of the SIPS in the studied system, the individual
operational mode and possible interconnections between the schemes can be identified.
With each scheme having 3 operational states (i.e. N, DBM and SBM), a total number
of 9 (32) states are considered. If we assume both schemes have the same design, and
consequently the same failure and repair rates, the simplified 6-State Markov Model is
used for reliability assessment.
Based on the previously discussed reliability assessment results in Chapter 3, the mean
time to failure (MTTF) for the circuit breakers, the process bus and merging units, the
IEDs, the LAN and the WAN are 100, 59.9, 100, 63.8 and 50 years respectively. The
probabilities of different SIPS states and interactions are assessed using the Markov
Modelling described in Equations (7-3) - (7-10). The probability of being in each
“failure to operate” state and the frequency of encountering a spurious operation state in
the next hour for each SIPS design are estimated and recorded in Appendix D.
Table 7-1 compares the reliability of implementing the SIPS with different level of
redundancy and tripping logic in the studied system. The probabilities of being in the
DBM related states (i.e. S2, S4 and S5) can be reduced by providing redundancy in the
communication system. For example, by duplicating the substation process bus and
station bus system, the probability of being in S4 for the scheme using a voting logic is
reduced from 7.98×10-4 to 5.01×10-4. Moreover, using the logic solver to validate the
decisions made by the redundant line-outage detection systems prior to issuing a trip
decision will lead to a less frequent entry to the SBM states. This optimization in SBM
is more obvious in a SIPS design with a higher level of redundancy (e.g. Arch4).
However, it also inevitably leads to increased probability of being in State S2 (D&N),
S4 (D&D) and S5 (D&S). For example, when the vetoing logic is used instead of the
voting logic, the probability of Arch4 being in state S2 (D&N) increases from 5.39×10-3
to 4.60×10-2.
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 195
Table 7-1: Probability of each Operational State in a System with Two SIPS
Sys States
Arch1
(voting)
Arch1
(vetoing)
Arch4
(voting)
Arch4
(vetoing)
Single process/station bus Dup. process/station bus
Normal (Level 0)
S1 N&N Pr 9.68×10-1 9.54×10-1 9.94×10-1 9.53×10-1
1 Maloperation (Level 1)
S2 D&N Pr 3.0×10-2 4.52×10-2 5.39×10-3 4.60×10-2
S3 S&N Fr 1.75×10-6 1.38×10-5 2.39×10-5 1.36×10-5
2 Maloperations (Level 2)
S4 D&D Pr 7.98×10-4 9.87×10-4 5.01×10-4 1.00×10-3
S5 D&S Pr 5.94×10-6 7.03×10-6 1.40×10-6 7.06×10-6
S6 S&S Fr 4.46×10-7 4.38×10-7 4.60×10-7 4.37×10-7
Although the probability of two schemes simultaneously being in the failure state (i.e.
S4, S5 and S6) is much lower compared with other states, the probability of unintended
interaction between SIPS may increase dramatically with the number of schemes in the
system. As per Equation (7-3), the failure rates in the Markov Model are proportional to
the number of SIPS in the system (i.e. N). The variation in the probability of
interactions with the number of SIPS in the system for SIPS design Arch4 (Voting) is
given in Table 7-2. Although the probability of scheme interaction is small in the
studied system with two schemes, it may increase dramatically as the number of the
schemes in the system increases. For example, the probability of being in the State 4
(D&D) i.e. two dependability-based maloperations, increases from 0.05% to 2.1% as
the number of SIPS in system increases from 2 to 10. Therefore, it is of great necessity
to evaluate the variation in SIPS risk as the number of SIPS increases.
Table 7-2: Variation in the Probability of Interactions between SIPS for Arch4(voting)
No. of SIPS S4 D&D S5 D&S S6 S&S
Probability Probability Frequency
2 5.01×10-4 1.40×10-6 4.60×10-7
3 1.50×10-3 4.96×10-6 1.37×10-6
4 2.98×10-3 1.14×10-5 2.73×10-6
5 4.93×10-3 2.13×10-5 4.51×10-6
6 7.34×10-3 3.53×10-5 6.70×10-6
7 1.02×10-2 5.41×10-5 9.27×10-6
8 1.34×10-2 7.80×10-5 1.22×10-5
9 1.71×10-2 1.07×10-4 1.55×10-5
10 2.10×10-2 1.43×10-4 1.91×10-5
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 196
7.3.2. Impact Assessment Results
The impact of the previously discussed 6 different SIPS operational states in the PJM 5-
bus system is investigated under various system conditions. To illustrate the variation in
SIPS impact under various system conditions, numerous case scenarios are generated by
changing the load level and the generation output of each wind farm. Specifically, the
wind levels at each wind farm could be at a low level (0 MW), a medium level (50 MW)
and a high level (100MW). Whilst, the load level at 100% and 80% of the peak load are
illustrated.
All the possible consequences caused by SIPS operation and the financial impacts [99]
are listed in Table 7-3. In this study, it is assumed the restoration of a wind farm after a
SIPS operation takes 2 hours. In addition, load shedding caused by cascading failures
takes 5 hours to recovery. The impact of each SIPS state at a particular system condition
( , )i ig d can be calculated as:
( , ) ( ) ( ) ( , ) ( , )
( ) 18000 ( ) 120 ( ( ) ( )) 50
i i i i
i i L i G i R i i S i i
GS
Im g d C d C g C g d C g d
VOLL i t LOWG i t LOWG i VOLL i t C
(7-13)
Where VOLL represents the value of lost load, LOWG is the loss of wind generation.
GSC is the cost of generator start-up cost. t represents the duration of the impact on
system.
Table 7-3: Impact Assessment Data of Different SIPS Operation [99]
State Cost Items Duration(hrs) $/MWh(Case)
SIPS
Operation/SBM
Wind Curtailment 2 120 $/Case
Wind Farm re-start - 10000 $/Case
Re-dispatch 2 50 $/MWh
SIPS DBM
Load shedding 5 18,000 $/MWh
Generator start-up - 5000 $/Case
Energy Re-dispatch 5 50 $/MWh
The impacts of the studied SIPS operational states under particular system conditions
are illustrated in Figure 7-6; the DBM related states are in Figure7-6 (a), whilst the
normal operation and SBM related states are in Figure7-6 (b). The impacts caused by
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 197
DBM related states are considerably larger than the normal operation and SBM related
states. This is caused by the considerably higher severity of the DBM consequences (i.e.
isolation of entire generation plant and load shedding).
The impact of DBM related operational states increases with the load level and the wind
farm output. This is because when the load and wind output are high, there is a higher
probability of cascade tripping following an initial SIPS DBM and the more severe
financial consequences (e.g. higher VOLL). Under most system operating conditions, a
single scheme DBM (S2) has a limited impact on the system as compared with a Level
2 DBM related states (i.e. S4, S5). The worst case scenario is brought by S5
(DBM&SBM) when the system is stressed with heavy loading and high wind output.
This indicates that, the SBM of one scheme could lead to rescheduling of generation
and changes to the power flow, which may stress other parts of the system. If this is
followed by the DBM of the other scheme, the result is cascade line outages and serious
economic impact. When operating as designed (i.e. S1), the operational cost reach the
highest level when both the wind and load levels are high, since in this case both wind
farms have to be disconnected to ensure stability after a critical line outage. The impact
of SBM is only related to the wind power output of the two wind farms at the time of
scheme maloperation. The economic impact in this case is from wind curtailment,
energy redispatch and start-up of the tripped generators. From the impact assessment
results, it can be seen that the SIPS operational states associated with unintended
interactions could have a wider influence on the system and may lead to greater
economic impact as compared with individual SIPS maloperation.
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 198
(a)
(b)
Figure 7-6: Impact Assessment Results under Various System Conditions: (a) Impcat of
DBM related States. (b) Impact of SBM related States.
7.3.3. Risk Assessment Results
The risks introduced by implementing SIPS with different tripping logic and
communication architectures are illustrated and compared in this section. With the
application of the WAMPAC system, measurements from distributed substations can be
centralised and used by the centre controller for SIPS decision making. Line-outage
information from both local and remote substations, especially which related to critical
lines, can be collected for centralised decision making to ensure enhanced dependability.
The performance of SIPS communication architectures introduced in Figure 5-2 is
evaluated in the 5-bus system with two schemes. Figure 7-7 shows the risks of different
SIPS designs, with the contribution of each SIPS operational state represented using
different colours in each column. For the local SIPS shown in Figure 7-7, only local
measurements are used for SIPS decision making. The scheme design with fully
redundancy and a voting tripping logic (i.e. Arch4 (voting)) delivers the optimal overall
performance with a risk of 12690 $/year. The implementation of a redundant
communication network proved to be an effective way to enhance scheme dependability.
Whilst the use of vetoing logic in local SIPS leads to a higher overall risk due to
increased Risk of DBM. For example, Arch4 (vetoing) has a DBM related risk of 20030
$/year as compared to a DBM relayed risk of 6967 $/year for Arch4 (voting). More
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 199
specifically, the cost of SIPS normal operation stays at approximately 2400 $/year for
all the communication architectures and tripping logics and is mainly determined by the
operational frequencies of the two SIPS in the system. Risks caused by the DBM of one
of the SIPS are the main contribution to the total risks for most designs in this numerical
study, because of the considerably larger impact of DBM as compared to normal
operation or SBM and higher probability as compared to SIPS interactions.
Implementing duplicated process bus communication system at substation bay level and
duplicated LANs in accordance with the parallel redundancy protocol (PRP) can
significantly enhance the performance in terms of dependability. Nevertheless, the
redundant communication system may also lead to higher risk in SBM and SBM related
interactions. For example, when a voting tripping logic is used, the introduction of the
duplicated PRP LANs (i.e. Arch4(voting)) in the substation automation system will
increase the Risk(SBM) from 6059 $/year to 6967 $/year.
Figure 7-7: Annual Risk Induced by Different Local SIPS Designs
When a system wide centralised SIPS is used, additional sources of line outage
information can be obtained from both local and remote substations. Consequently, the
level of redundancy in the scheme activation signal is relatively high. As shown in
Figure 7-8, the risks are mainly from SBM related operational states. Therefore, the use
of vetoing logic can effectively reduce the risk of SBM without significantly increasing
the risk of DBM. For example, by using the vetoing tripping logic in a system wide
SIPS with Arch2, the risk of SBM can be controlled at a relatively low value and the
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 200
optimal overall performance can be achieved. Meanwhile, when the redundancy level in
the SIPS local communication network is low (e.g. Arch1 without redundant process
bus and LANs in the SAS), the use of system-wide centralised SIPS could effectively
reduce its operational risks by enhancing its dependability performance.
Figure 7-8: Annual Risk Induced by Different System-Wide Centralised SIPS Designs
As shown in Table 7-2, the increased number of SIPS leads to a higher probabilities of
SIPS interactions. Therefore, it is necessary to estimate the trend in the risk associated
with SIPS operation as more SIPS are being implemented. To assess the operational risk
of a system with multiple SIPS, system studies are required to evaluate the
consequences of each SIPS on system operation. By assuming the impact assessment
results in Figure 7-6 to be the average impact of all the SIPS in the system, the
operational risk can be estimated by multiplying the probabilities of different
operational states with associated impact.
Figure 7-9 shows the variation in SIPS risk against the number of schemes implemented
in the system. It can be seen that the risks caused by all the four SIPS operational states
(i.e. Normal operation, DBM, SBM and Interaction) increase with the number of
schemes in the system. When there are more than 10 schemes in the system, risk of
SIPS interactions becomes the greatest contributor, taking up to 33.9% of the overall
risk. This indicates that although the risk from SIPS interaction may be small currently,
there is a probability of dramatic increase following a wide-spread implementation of
SIPS in the future.
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 201
Figure 7-9: Variation in Risks (Arch4 voting) with Number of SIPS in System
7.4. System Planning Incorporating SIPS
Although SIPS can provide a less expensive way to fulfil the reliability requirements of
a Power System, the risk assessment indicates that a high penetration of SIPS results in
increased risks due to the high probability of undesirable interconnections and increased
operational complexity. Additionally, new challenges are brought to the transmission
system due to the continuous integration intermittent renewable generation. Another
way to effectively alleviate congestion and to allow more wind and PV integration is
transmission network upgrading. However, due to the considerable cost required,
transmission upgrading is carried out in conjunction with SIPS to enable an effective
trade-off between operating and investment cost and system integrity. SIPS operation
under future generation and transmission upgrade scenarios is evaluated to determine
the optimal transmission and generation expansion plan.
The risk assessment method is now used to illustrate the variation in SIPS risk in a
planning horizon of 25 years, incorporating transmission upgrading, demand increase
and wind integration. It is assumed demand increases by 1% per annum at each load
point. In order to fulfil the ‘Gone Green Plan’ proposed by UK National Grid, the wind
capacity of both wind farms is increased from 100MW to 200MW in a step of 25 MW
every 5 years. This will ensure wind energy will supply 26.6% of the total energy at the
end of the planning horizon. The Local Marginal Price (LMP) [100] is introduced to
determine the candidate lines for transmission expansion. By comparing the LMP at
each bus, the candidate line is built to connect the bus with the lowest LMP with the
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 202
highest LMP. L1-3 (i.e. circuit connecting bus 1 and 3) is identified as candidate line
using this method. In addition, the introduction of the new transmission line will reduce
the times that SIPS is required to operate and allow greater power transfer from the
power plant with the lowest cost. Therefore, the annual production cost and the
frequency of wind curtailment after transmission expansion are examined.
300MW
300MW
300MW
400MW
300MW
B1 B2
B3
B4B5
WF1: 100MW
WF2: 100MW
400MW
L1
L2L3
L4
L6
L5
G4
G1G2
L1-3
200MW
G3
Figure 7-10: PJM 5-Bus System with Transmission Expansion
The variation in SIPS operational risks in the 5-bus system over the 25-year planning
horizon is shown in Figure7-11. It is assumed that a new transmission line L1-3 is built
in the 20th year. Due to the continuously increasing wind integration and load demand,
the operational risks of each of the four SIPS operational states continue to increase. If
transmission upgrading is not implemented, at the end of the 25-year period, the SIPS
risk will increases from 12690 $/year to 40770 $/year. The congestions in the system
will significantly impede the ability to integrate large scale renewable generation.
Building new transmission lines linking the wind rich areas to the load centre (e.g. L1-2,
L1-3) can effectively relieve the congestions and significantly reduce the risks
introduced by SIPS operation, effectively keeping overall risks within 25000 $/year.
This is achieved due to the reduced operation frequency of SIPS-1 and SIPS-2.
However, it leads to a noticeable increase in the risk of SBM, because the schemes in
the system have a new transmission line to monitor during a system disturbance.
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 203
Figure 7-11: Variation in SIPS risks in a planning horizon of 25 years
The impact of line expansion on system production cost and wind curtailment by SIPS
are shown in Table 7-4. Despite the considerably higher investment cost incurred by
transmission expansion, the total production cost can be reduced by transferring large
amounts of cheaper energy from the wind rich areas to the load centre. The reduction in
production cost also increases with the time as the wind generation and load demand
increase. For example at year 25, the introduction of L1-3 reduces production cost by
3.62×105 $/year. In addition, the wind curtailment due to SIPS operation also reduces
when either L1-2 or L1-3 is introduced. By integrating the SIPS risks into the
transmission expansion model, a SIPS-aided transmission expansion plan can be carried
out to minimize production and investment costs.
Table 7-4: System Production Cost and Wind Curtailment with Simulation Year
Production Cost
($/year)
Wind Curtailment
(MWh/year)
Year No L1-3 L1-3 (reduction) No L1-3 L1-3
1 7.12×107 -1.58×105 6.91 1.96
10 8.93×107 -2.22×105 18.89 7.10
20 1.13×108 -3.01×105 40.18 17.85
25 1.25×108 -3.62×105 49.99 24.54
7.5. Sensitivity Study
Sensitivity analysis is carried out to evaluate the impact of the high uncertainty in the
data used in the simulation on the risk assessment results. The reliability of the
components used in a scheme’s communication system is determined by its failure rate
(λ) and the repair rate (μ). The failure rate is expected to vary over the life cycle of a
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 204
device and also with different manufactures of the device. The frequency of inspection
and maintenance of the components, reflected as repair rate, also has a great impact on
scheme risks. In addition, as illustrated in the component-level Markov Model in
Section 5.4.2., the relationship between the failure rates of detectable DBM (λdd),
undetectable DBM (λud) and SBM (λst) complies with the equation:
: : : : 2:1: 2dd ud st . Different devices might have different self-monitoring
abilities, which will lead to significant variation in the probabilities of being in each
failure mode.
Figure 7-12: Impact of Variation in Reliability Data on SIPS Risks
The impact of these uncertainties on the simulation results are evaluated by performing
sensitivity study. As shown in Figure 7-12, using a more reliable device with less failure
rate can always lead to a better overall performance. Meanwhile, by increasing the
repair rate, the overall risk brought by the schemes can also be reduced. The repair rate
of a device’s undetectable DBM (µud) can be increased by more frequent scheme
inspection and maintenance. For the failures that can be detected by the self-monitoring
function (i.e. detectable DBM, SBM), a more timely replacement of the faulty device
could effectively reduce the operational risks. With an enhanced self-monitoring ability,
a higher percentage of DBM failures can be detected by the device (i.e. increased α). In
this case, the operational risk of the system can be significantly reduced. For example, if
the percentage of the detectable DBM increases from the original 40% to 60%, the total
annual risks induced would decrease from 12,662 $/year to 9,240 $/year. Consequently,
allocating more maintenance efforts on the critical components and the condition
monitoring of these devices could be effective in mitigating SIPS operational risk.
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 205
7.6. Managing the SIPS Risk Using Adaptive SIPS
As described in the sensitivity study, the increasingly variable operating conditions of
Power Systems significantly affect the SIPS performance. Therefore, it is difficult for a
single SIPS design to be suitable for all the weather and system conditions. The
introduction of WAMPAC offers an opportunity to improve the performance of SIPS
using more adaptive and intelligent protection logics. This allows the system operator to
shift the balance between system dependability and security according to the current
system conditions.
As shown in Figure 7-13, the information collected by the sensors located at each
substation and wind farm are centralised using the WAN. An adaptive SIPS is designed
to select the optimal operational logic for all the SIPS in the system based on the current
system conditions. The wind generation outputs of the wind farms and the load levels of
all load points are collected by the local sensors and then sent to the controller via the
WAN. The centralised controller is used to estimate the risk of different SIPS
operational logics using the risk assessment method proposed in this paper and choose
the most suitable operating logic to achieve the minimum operational risk and the
optimal SIPS coordination.
In this case study, the two GRS (i.e. SIPS-1 and SIPS-2) could adjust its tripping
algorithm to use either a “voting” or a “vetoing” logic. In addition, the GRS could also
decide whether to use additional activation signals from the remote substations (i.e. B2
and B3) to enhance the dependability of its operation.
Figure 7-14 shows the variation in the SIPS operational risks in the studied PJM 5-bus
system with varying system conditions during a winter month. The operational risk
when both GRS are local schemes with Arch4(voting) design is compared with the risk
induced by the adaptive SIPS. The adaptive SIPS offers a noticeable reduction in
operational risk compared with the conventional design. In addition, it also helps reduce
the probability of cascading failures when the system is heavily stressed with high wind
output and load demand. The operating logic of the adaptive SIPS is switched 121 times
in 720 hours, leading to an average time interval of 5.95 hours per switch.
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 206
Figure 7-13: Adaptive SIPS using WAMPC Platform
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 207
Figure 7-14: Variation in SIPS Risks under Different System Conditions
In particular, when the load level and the wind level of both wind farms are relatively
low (e.g. at time t1), the system has sufficient generation reserve and the transmission
facilities, which provide robust alternative path after the event of contingencies.
Therefore, SIPS DBM has limited impact on the system under the studied system
condition. The optimal performance at this situation is achieved when the vetoing logic
is applied to both SIPS-1 and SIPS-2, which could provide the optimal performance in
terms of system security. At t2, predicting the load level in the system increased to 81.9%
and the wind speed at WF1 is going to be high, SIPS-1 will adjust to the voting logic
and collect line-outage signals from both local and remote substations to maximize the
dependability. With SIPS-1 being highly dependable and with a low wind level at WF2,
consequences following a SIPS-2 DBM can be effectively mitigated by the operation of
SIPS-1, making it critical in maintaining scheme dependability. Nevertheless, the
vetoing logic is used by SIPS-2 to reduce the security risks without significant
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 208
compromise in system dependability. Consequently, the overall risk can be reduced
from 5.2 $/hour to 2.9 $/hour. At t3, with the increase in the wind level at WF2, SIPS-2
starts to use the remote signals to enhance the dependability. This follows a reduction in
operational risk from 1.4 $/hour to 0.61 $/hour.
Figure 7-15 illustrates the variations in SIPS operational risks in three typical days. The
changes in the operational logic for the adaptive SIPS for a 24 hours/1day period are
shown in Figure 7-16. It can be seen that during the night (i.e. 23:00-6:00), when the
load level is low, the consequences following a SIPS dependability-based maloperation
is relatively low. Consequently, the most secure operational logic, i.e. the “vetoing”
logic, is used by both SIPS in the system to ensure the highest security. During the
daytime, when the load level is relatively high, wide-area information is then used as
redundant activation signal to enhance scheme dependability. When the wind levels at
both wind farms are high (e.g. Scenario 1 and Scenario 2), the operational risks of the
system can be effectively controlled within a low level by shedding the wind farms
during contingencies.
Figure 7-15: Variations in SIPS Risks in Three Typical Days
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 209
The highest operational risk occurs during a low wind level and a high load demand
system condition (i.e. Scenario 3). The operation of the GRS on the wind farms has
limited impact on relieving system congestions. During the peak load period of the day
in Scenarios 3 (i.e. 17:00-19:00), SIPS-2 is switched to the most secure “Vetoing” logic
to prevent the spurious trip of the wind farm 2 (WF2). This is vitally important when the
load demand is high and the generation reserve is not sufficient. In this case, the trip of
WF2 may lead to increased generation output at the plant at B1 and eventually cause
overloading on the other part of the system. Load shedding schemes are required to
prevent system cascading failure by disconnecting some of the load.
It can be concluded that SIPS based on the predetermined operational logic may not
necessarily deliver the optimal operation. The hierarchically layered control actions and
the continuously varying system conditions require system operators to make control
decisions based on real-time data and a system-wide view. Therefore, the key to
achieving effective SIPS applications resides not only on the measurement IEDs and
communication infrastructures, but also on the fast computing and data processing
computers and analysis software tools that offer valid solutions for various system
conditions. The proposed risk assessment procedure offers an effective method to
ensure optimal SIPS performance and facilitates system operator in decision making
during severe system contingencies.
0:00 06:00 12:00 18:00 24:00
SIPS-1
SIPS-2
SIPS-1
SIPS-2
SIPS-1
SIPS-2
Vetoing
Vetoing
Vetoing
Vetoing
Vetoing
Vetoing
System-wide Voting
System-wide Vetoing
System-wide Voting
System-wide Vetoing
System-wide Voting
System-wide Vetoing Vetoing SW Vetoing
Vetoing
Vetoing
Vetoing
Vetoing
Vetoing
Vetoing
Scenario 1
Scenario 2
Scenario 3
+
Figure 7-16: Operational Logics of Adaptive SIPS during a Day for each Scenario
7.7. Summary
This chapter provides a procedure to assess the impact of undesirable interactions
between SIPS. The evaluation results indicate that SIPS maloperations and interactions
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 210
introduce additional risks to system. Different SIPS interactions scenarios in the PJM 5-
bus system with two GRS are studied. A SIPS-aided transmission expansion plan was
carried out to illustrate the impact of future energy integration and transmission
expansion on SIPS risks.
Unintended interactions between SIPS could result in cascade failures and lead to a
more severe impact compared with individual SIPS failure. In addition, the operating
risk exposure of SIPS, and especially risks caused by SIPS interaction, would increase
significantly with greater wind integration and as the number of SIPS increase. The
probability and severity of unintended SIPS interactions is highly related to the system
condition. Under stressed system conditions with a high load demand and generation
output, the cascading SIPS operation is more likely to occur. Unintended interactions
between SIPS could lead to a more severe impact compared with individual SIPS failure.
In the near future, system operators are facing more severe challenges in managing the
operational risk of SIPS. Therefore, SIPS within the context of Power System long-term
planning is considered. The continuously increasing wind penetration and load level
will increase the operational cost of SIPS. The build of new transmission circuits could
significantly reduce SIPS risk, decrease wind curtailment and allow more access to the
cheaper energy in the system. Therefore, a SIPS assisted transmission upgrading plan
helps maximize system reliable operation whilst minimizing he production and
investment costs.
A new type of SIPS with adaptive operational logic, which adjusts to the various system
conditions, is developed to manage SIPS-induced risks. Currently, the existing SIPS are
built based on predetermined seasonal and off-line mitigation actions. The adaptive
SIPS, made available by the modern IEDs and system wide monitoring system, allows
system operators to shift the balance between system dependability and security. When
the system is less stressful with relatively low load level and generation output, the risk
from SBM is the main source of SIPS operational risk. Consequently, more secure
protection logic can be implemented. On contrast, when the system is heavily stressed
or when the other scheme is in failure state, the successful operation of the SIPS is
vitally important in preventing system cascading failure and need to be highly
dependable. The adaptive protection not only helps reduce the risk of DBM and SBM of
individual SIPS, but also helps achieve better coordination between SIPS. The
Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning
Page | 211
significant variation in SIPS operational risk due to the fast changing system conditions
can be effectively mitigated.
The proposed methodology can help utilities understand the impact of advanced ICT on
SIPS reliability and quantify the continuously increasing probabilities of unintended
interactions among SIPS on the same or neighbouring systems. It also helps ensure
optimal SIPS performance and facilitates system operator in decision making during
severe system contingencies.
Page | 212
CHAPTER 8
CONCLUSIONS AND FUTURE WORK
8.1. Introduction
Continuously growing energy demand, the connection of bulk renewable generation and
the deregulation of the electric energy market have brought significant challenges to the
reliability of a Power System. System Integrity Protection Schemes (SIPS), fully
integrated with ICT, modern IEDs and advanced control algorithms, are now being
implemented by the system operators to minimize the probability of large system
disturbances and to fulfil the strict requirements for overall Power System reliability.
However, changes in the ICT infrastructure and ageing protection assets raise major
concerns about the reliability of the protection system and their impact on system
operation, especially during severe operating conditions or Power System contingencies.
The aim of this research is to provide an insight into the reliability of the protection
schemes installed in the transmission network. To achieve this, main causes of SIPS
maloperations and their impact on system operation are studied. Probabilistic based
reliability assessment models have been developed to quantitatively assess the risk of a
SIPS and to determine the optimal SIPS design and operational logic. Furthermore,
reliability enhancement methods and design considerations selected to improve SIPS
Chapter 8: Conclusions and Future Work
Page | 213
performance are discussed. The main conclusions drawn from this research and
suggestions for the future work are summarised in the following sections.
8.2. Conclusions
SIPS are a cost effective and easy to implement method of enhancing Power System
reliability and maximizing its transfer capacity. The use of SIPS becomes increasingly
attractive as the level of renewable generation increases drastically and societal policies
create a stagnant transmission upgrade policy. Although SIPS provide corrective control
actions for various abnormal system conditions and are designed to preserve system
integrity, failure to fulfil its reliability requirement will expose the Power System to
additional risks.
Major SIPS related system disturbances reveal the consequences of a SIPS maloperation.
In the pre-cascading phase, the effective and quick operation of SIPS is vital in
preventing the spreading of a disturbance. Incorrect, delayed or the failure of SIPS to
operate increases the probability of the system entering the cascading phase and may
eventually lead to severe consequences such as load disconnection. The surveys
conducted by the IEEE-CIGRE working group shows that SIPS may normally fail in
two ways: 1) Dependability-based maloperation (DBM), which is a failure to operate
when the SIPS is required. 2) Security-based maloperation (SBM), which means an
unwanted SIPS operation when there is no disturbance in the system. In addition, due to
the high penetration of SIPS in many Power Systems, the complexity of system
operation has significantly increased in recent years. This also leads to a higher
probability of undesired interactions between SIPS located on the same or neighbouring
systems.
A review of the system disturbance reports issued by NERC indicates the main causes
of SIPS related events are hardware or software failures, faulty design logic and human
errors. In addition, the majority of the recorded events were caused by SIPS security-
based maloperations. The consequences following a SIPS failure to operate (i.e. DBM)
are normally much higher compared with unnecessary SIPS operations (i.e. SBM).
Nevertheless, the SBM, although normally it has limited consequence on system
operation, needs to be effectively considered in the risk assessment model due to its
Chapter 8: Conclusions and Future Work
Page | 214
greater likelihood of occurrence. The aim is to balance the trade-off between scheme
dependability and security in SIPS operation.
The importance of SIPS reliability has been recognized by utilities and is addressed in
SIPS design and operation. Reliability standards are developed to evaluate the
performance of the protection schemes and ensure it fulfils the strict reliability
requirement. The review of existing SIPS applications, illustrates industry practices and
their attempts to use new technologies for monitoring, communication and control to
further enhance SIPS performance. The advances in ICT enable real time monitoring of
system conditions and provide a more accurate state estimation to facilitate the decision
making of SIPS. This also brings more flexibility in SIPS design and opens up more
solutions in enhancing SIPS performance. However, changes in the instrumentation,
monitoring, protection and control systems also raise major concerns in the overall
reliability of SIPS and needs to be considered in the reliability assessment.
Substations, as key components of a power grid, play a vital role in monitoring and
controlling power flows and interconnecting generating facilities, transmission and
distribution networks and customers. Successful operations of both local and system-
wide SIPS are heavily reliant on the monitoring, communication and control functions
in the substation automation system (SAS). The impact of component reliability on the
performance of different communication services in an IEC 61850 based substation is
discussed in Chapter 3. This is demonstrated by studying the availability of both
reporting and multi-casting communication services in the SAS. Component reliability,
system architecture and maintenance strategies are the main factors affecting the
reliability of the substation communication services. The implementation of redundancy
is also to be an effective method to improve system performance by eliminating “single-
point-of-failures”. The inherent redundancy of the RSTP ring station bus, and the
redundant communication paths implemented in accordance with IEC 62439,
significantly improves the MTTF of the data transfer in SAS. In addition, fast
identification and repair of failed components are also vitally important.
The protection devices are identified as one of the most critical components in the SAS,
they involve more hardware devices, software routines, firmware modules and user
defined settings than other types of equipment used in the SAS. Tripping signals from
protection devices are frequently used by SIPS for fast detection of a line outage. The
Chapter 8: Conclusions and Future Work
Page | 215
UK transmission network predominantly utilizes electronic or early numerical based
protection equipment to detect and clear short circuit fault. A significant number of
these protection devices are now reaching their predicted design lifetime of 25 years.
Consequently, the reliability and performance of the local protection devices need to be
investigated to ensure they are still in their reliable service lifetime, and furthermore, to
ensure they do not adversely affect the operation of SIPS that utilise their output
responses. A life-time assessment carried out in Chapter 4 evaluates the operational
conditions and identifies the life-limiting elements of the most commonly used
electronic relay types in the UK National Grid 400 kV and 275 kV transmission
networks.
The protection maloperation record indicates that all the selected relay types are serving
in a highly reliable manner, with no statistical evidence of vulnerable components or
modules. In addition, all of the three relay types offer equal performance in operational
speed and accuracy for their intended functions as compared to modern relay types. The
components most vulnerable to the thermal stress, high current stress and voltage stress
are identified as life-limiting elements and then examined via 3D X-ray micro
tomography study. No signs of degradation or wear-out can be identified. The study
helped National Grid extend the reliable service life of these protection relays for an
initial extension period of five years. It was concluded, equal reliability performance
can be achieved, as compared to what would be achieved by replacing them. In addition,
risks of infant mortality failures or initial application problems associated with
replacement relays can be avoided. This study also ensures the successful operation of
SIPS, which relies on reliable and timely operations of local protection.
To better manage the additional risks brought by SIPS, studies are required to evaluate
the impact of SIPS failures on the Power System and use the results to develop
appropriate reliability assessment models. An analytical risk assessment method based
on the reliability block diagram and the Markov model was developed in Chapter 5 and
used to quantify the risks associated with SIPS during normal operation, dependability-
based maloperation and security-based maloperation. By performing FMEA, all the
different failure modes co-existing in a SIPS component and their impact on the overall
SIPS reliability are identified. Probabilities of different SIPS maloperations are
estimated by combining SIPS operational modes with different system events. The risk
Chapter 8: Conclusions and Future Work
Page | 216
of SIPS operation is calculated as the probability of each failure state weighted by its
corresponding financial impact.
This procedure is used to illustrate how the different SIPS communication architectures
could affect SIPS performance. The implementation of a duplicated communication
network can significantly enhance the performance in terms of scheme dependability.
However, more redundancy may not necessarily result in better overall performance,
since it also leads to increased security risk. The method is next used to compare the
performance of SIPS with two different tripping logics: voting logic and vetoing logic.
The impact of the two tripping logics on the trade-off between dependability and
security in SIPS design was also studied. SIPS with a 1-out-2 voting logic delivers
better performance in dependability. However, due to the high level of redundancy in
the SIPS design, the probability of unwanted SIPS operation is increased due to the
misinterpretation of inputs or data. In this case, the vetoing logic can be used to
effectively prevent spurious SIPS operations and mitigate the security risks.
One of the main concerns in reliability assessment is the accuracy of the reliability data.
In this study, component reliability data are based on published information. The
challenges in extrapolating these into system data have been recognised and
consequently sensitivity studies are undertaken. It proves the performance of SIPS is
significantly affected by the MTTF, MTTR and system conditions (e.g. line outage rate,
load level, etc.). In addition, sensitivity studies performed on the SIPS risk assessment
results provides a useful guidance for utilities to identify the least reliable SIPS
component or operational phase. Reliability enhancement strategies could then be
implemented accordingly and enhanced inspection and maintenance allocated to the
vulnerable components. It also proves that the arming phase has equal importance in
enhancing the SIPS performance compared with the activation phase. In addition, the
application of sensitivity studies on Power System data effectively consider SIPS
performance under extreme system conditions.
With recent advances in wide area monitoring protection and control technology, the
implementation of SIPS with significant degree of centralisation have been completed
by some utilities. As illustrated in chapter 6, enhanced performance can be provided by
a system wide GRS, as compared with a local GRS, given a relatively high reliability of
the wide area communication network. The access to wide area information could
Chapter 8: Conclusions and Future Work
Page | 217
significantly assist the monitoring of system conditions and bring more flexibility in
SIPS logic design. In the future, the centralised SIPS would significantly facilitate the
coordination amongst adjacent protection schemes.
According to the UK National Grid Electricity Ten-year Statement (ETYS), the future
Great Britain energy landscape is going to involve a significant deployment of
renewable energy such as wind generation to decrease the carbon intensity of the
electricity system. Power System operational conditions will become more
unpredictable due to the intermittent nature of renewables and demand-side
participation. Generator rejection schemes (GRS) are frequently used to trip non-
priority generators during overloading and make full use of the transmission capacity.
Performance of a GRS implemented in a wind rich system is analysed in the numerical
studies. Failure of the GRS to operate during system contingencies could cause cascade
tripping of the transmission lines due to overloading and eventually lead to the isolation
of the wind farm or the load.
To effectively evaluate the impact of the significant variation in wind generation on the
risk assessment results, a stochastic risk assessment method based on Sequential Monte
Carlo Simulation (SMCS) was developed in Chapter 6. By integrating the ARMA wind
prediction model and dynamic load model, the risk assessment method could accurately
capture the time-series variations in the load and wind generation and the time-
dependent events. When implemented at a wind farm, significant variations in the
operational risk associated with GRS normal operation, DBM and SBM can be
observed due to the variations in wind generation output. Therefore, a precise wind
prediction model and a dynamic risk assessment method are critical in forecasting and
managing GRS risks.
As a cost-effective alternative to transmission system upgrading, SIPS is a widely used
solution to deal with an increasingly stressed transmission network. This results in a
widespread proliferation of SIPS in many networks, leading to increased operational
complexity and a higher probability of unintended or undesired SIPS interactions. As
reviewed in Chapter 2, both the Irish incident on 5th August 2005 and the Nordic event
on 1st December were caused by the interaction between overlapping or neighbouring
SIPS, leading to severe consequences, such as a system blackout. The previous methods,
which focused on assessing the performance of a single SIPS, are no longer sufficient.
Chapter 8: Conclusions and Future Work
Page | 218
A procedure to evaluate the risk of undesirable interactions between SIPS is provided in
Chapter 7. The simulation performed on the PJM 5-bus system indicates that the
probability of unintended interactions between SIPS is highly related to the system
condition. Under stressed system conditions with a high load demand and generation
output, the cascading SIPS operation is more likely to occur. Unintended interactions
between SIPS could lead to a more severe impact, as compared with an individual SIPS
failure. In addition, the operating risk of SIPS, especially the risk caused by SIPS
interactions, would increase significantly as the number of the schemes in the Power
System rises.
In the near future, system operators are facing more severe challenges in managing the
operational risk of SIPS. Therefore, the role of SIPS within the context of long-term
Power System planning is considered. Increasing wind penetration and rising load
levels will increase the operational cost of SIPS. However, the construction of new
transmission circuits significantly reduce the risks associated with SIPS maloperations,
decreases wind curtailment and allow greater access to lower cost and/or
environmentally friendly energy in the system. Therefore, a SIPS assisted transmission
upgrading plan helps maximize the reliable operation of the Power System, whilst
minimizing the production and investment costs.
A new type of SIPS with adaptive operational logic, which adjusts to the various system
conditions, was developed in Chapter 7 and used to manage the SIPS induced risk.
Currently, most existing SIPS are built based on predetermined seasonal and off-line
mitigation actions. However, adaptive SIPS, made available by modern IEDs and a
system wide monitoring system, allows system operators to shift the balance between
system dependability and security. When the system is less stressed and has a relatively
low load level, as compared to maximum or nominal system load, the risk from SBM is
the main source of SIPS operational risk. Consequently, more secure protection logic
can be implemented. In contrast, when the system is heavily stressed or when the other
scheme is in failure state, the successful operation of the SIPS is vitally important in
preventing a system cascading failure and needs to be highly dependable. The adaptive
protection not only helps reduce the risk of DBM and SBM of individual SIPS, but also
helps achieve better coordination between SIPS. The significant variation in SIPS
operational risk due to the fast changing system conditions can be effectively mitigated.
Chapter 8: Conclusions and Future Work
Page | 219
In conclusion, the study undertaken in this research provides a comprehensive
framework to assess the reliability of SIPS designed to prevent system contingencies.
The proposed risk assessment methodologies could assist utilities to determine the
optimal SIPS design and to effectively manage the risks brought by SIPS to Power
System operation.
8.3. Future Work
Based on the work presented in this thesis, the suggestions for the future work are
focused on assessing the newly emerged opportunities and challenges related to the
reliable operation of a Power System, the optimization of the SIPS reliability
assessment models and seeking new solutions to enhance SIPS reliability.
Investigating the next step in SIPS development
Currently, most existing SIPS are event-based with predetermined seasonally or off-line
defined mitigation actions. The enabling technologies that have brought great benefits
and advances in the design and application aspects of SIPS are discussed in this study.
With the platform of centralised SIPS and its wide-area communication infrastructures,
SIPS could use data from a wider area to enhance system estimation and provides
opportunities for WAMPAC schemes.
When the data required by a SIPS controller is sourced from different locations, it is
crucial that all the data is synchronised to ensure efficient and effective system
operation. Application of the IEC 1588 high-precision time protocol can achieve sub-
microsecond accuracy time synchronisation for both LAN and substation-wide time
information [100, 101]. In addition, the strict time requirements of the remedial control
actions need to be fulfilled. To achieve this, it is vital to ensure fast computing and data
processing by the central controllers and to confirm the effectiveness of the analysis
tools when delivering valid solutions for all possible system contingencies.
In addition, IEC Technical Report 61850-90-5 [30] provides details of a communication
protocol for event-driven GOOSE message, designed to extend its application from a
LAN to a WAN. This significantly facilitates the application of SIPS in a wider area.
However, it also raises concerns about the security of the GOOSE message over WAN
based communication. The GOOSE message needs to be encoded to reduce the
Chapter 8: Conclusions and Future Work
Page | 220
vulnerability related to cyber security. The Group Domain of Interpretation (RFC 6407 -
GDOI) can be used to provide symmetric keys to secure data signing and encryption.
However, the performance of GOOSE message communication during cyber-attacks
and the associated cyber security risk needs to be investigated.
Furthermore, another important aspect in SIPS development is to develop better system
visualisation to enhance the capabilities of wide-area schemes. With the WAMPAC
system and phasor measurement units, system visualization of real-time data can be
realised. This helps provide system operators with greater system awareness and this
allows more precise control of the system, hence reducing the probability of
maloperations caused by human errors.
Investigate of the impact of demand side management on SIPS risks
In the SMCS based risk assessment procedure developed in Chapter 6, the IEEE-RTS
load model is used to capture the load variation over a calendar year. Testing results
indicates that the risk of SIPS DBM becomes extremely high during severe system
conditions with high wind generation output and extreme load demand. The use of
demand management, enabled by innovations in advanced metering infrastructures,
communications and smart appliances, brings more sophisticated demand response
options. It helps shift consumption from peak hours to off-peak hours. Appropriate load
shifting is becoming more crucial with the popularization of electric vehicles (EV).
However, changing human behaviour based on electricity pricing or direct control is
problematic and is often less effective than expected [102].
Hence, in the future, it is recommended to consider how demand side management can
reduce the SIPS operational risks. By using either direct load control (DLC) or real time
pricing (RTP), the peak-to-average ratio (PAR) in load demand can be effectively
reduced [100, 103]. If this is achievable, the significantly high risk of SIPS operation
during severe system conditions can be mitigated. Furthermore, the operation of SIPS
and demand side management strategies need to be coordinated to achieve optimal
system reliability.
Verification of Protection Asset End-of-life Analysis
Chapter 8: Conclusions and Future Work
Page | 221
The application of an end-of-life evaluation process to support and validate an asset life
extension decision for various selected relay types was described in Chapter 4. Due to
the critical function of the protection devices, the assessment of asset life is undertaken
during the reliable service lifetime of the equipment life, and before the occurrence of a
significant increase in ageing related failures.
To validate the effectiveness of end-of-life assessment based on sample testing with
limited lifetime data, specific follow-up rechecking procedures are required for each
relay type. The following tests can be done if more ageing related failures could be
identified:
1) For any targeted relay that fails in service during the extended lifetime, studies need
to be carried out to investigate the failure and report results, including any impact on
replacement life policy, and on conclusions of this set of reports.
2) Once more ageing related failure data are available, statistical analysis can be carried
out to predict the “rising edge” of the bath-tub curve for the protection devices. The
estimated reliable service life of the protection asset can be used to compare with the
conclusions drawn in this study.
Furthermore, with experience gained in the present study, the evaluation process can
then be applied to other electronic devices with similar components and hardware
platform.
Development of reliability database
One of the main concerns in reliability assessment is the accuracy of the reliability data.
It significantly affects the usefulness of the reliability assessment results. Currently,
most of the data used in the reliability assessment are based on reliability standards,
instead of field performance. Sensitivity studies, as illustrated in this thesis, are an
effective method to take consideration of the uncertainty in reliability data. However, in
the future, the development of a reliability database based on field performance will be
of considerable use in increasing the accuracy of the reliability assessment and
enhancing the qualities of the prediction of the Power System risk. Long-term
monitoring of Power System component defects and the tracking of their operational
conditions would also contribute to a better understanding of the components’ life cycle.
Page | 222
References [1] S. H. Horowitz and A. G. Phadke, "Boosting immunity to blackouts," IEEE
Power and Energy Magazine, vol. 1, pp. 47-53, 2003.
[2] I. Bazovsky, Reliability Theory and Practice. Dover Publications, 1961.
[3] CIGRE, "POWER SYSTEM RELIABILITY ANALYSIS," CIGRE WG 03 of SC
38, 1987.
[4] R. Billinton and R. N. Allan, Reliability Assessment of Large Electric Power
Systems. Kluwer Academic Publishers, 1988.
[5] R. Billinton and R. N. Allan, Reliability Evaluation of Power Systems. Plenum
Press, 1996.
[6] R. Billinton and W. Li, Reliability Assessment of Electric Power Systems Using
Monte Carlo Methods. Plenum Press, New York, 1994.
[7] P. Kundur, J. Paserba, V. Ajjarapu, G. Andersson, A. Bose, C. Canizares, N.
Hatziargyriou, D. Hill, A. Stankovic, C. Taylor, T. V. Cutsem, and V. Vittal,
"Definition and classification of power system stability IEEE/CIGRE joint task
force on stability terms and definitions," IEEE Transactions on Power Systems,
vol. 19, pp. 1387-1401, 2004.
[8] F. Rahimi, A. Ipakchi, and F. Fletcher, "The Changing Electrical Landscape:
End-to-End Power System Operation Under the Transactive Energy Paradigm,"
IEEE Power and Energy Magazine, vol. 14, pp. 52-62, 2016.
[9] NERC System Disturbance Reports. Available:
http://www.nerc.com/pa/rrm/ea/System%20Disturbance%20Reports%20DL/For
ms/AllItems.aspx
[10] "Report on Investigation into System Disturbance of August 5th 2005,"
Electricity Supply Board (ESB) and National GridDec. 2005.
[11] J. Walseth, J. Eskedal, and O. Breidablik, "Analysis of Misoperations of
Protection Schemes in the Nordic Grid," Protection, Automation, and Control
World, March, 2010.
[12] US-Canada Power System Outage Task Force, "Blackout 2003: Final report on
the August 14, 2003 blackout in the United States and Canada: Causes and
recommendations," Office of Electricity Delivery & Energy
Reliability,Washington, DC., 2004.
[13] "Report of the Enquiry Committee on grid disturbance in northern region on 30th
July 2012 and in northern, eastern & northeastern region on 31st July 2012.," The
Enquiry Committee, Ministry of Commerce and Industry, Government of India,
New Delhi, India, 2012.
[14] V. Madani, D. Novosel, S. Horowitz, M. Adamiak, J. Amantegui, D. Karlsson, S.
Imai, and A. Apostolov, "IEEE PSRC Report on Global Industry Experiences
With System Integrity Protection Schemes (SIPS)," IEEE Transactions on Power
Delivery, vol. 25, pp. 2143-2155, 2010.
[15] S. H. Horowitz, D. Novosel, V. Madani, and M. Adamiak, "System-wide
Protection," IEEE Power and Energy Magazine, vol. 6, pp. 34-42, 2008.
[16] V. Madami, M. Adamiak, and M. Thakur, "Design and implementation of wide
area special protection schemes," in 57th Annual Conference for Protective Relay
Engineers, 2004, 2004, pp. 392-402.
References
Page | 223
[17] P. M. Anderson and B. K. LeReverend, "Industry experience with special
protection schemes," IEEE Transactions on Power Systems, vol. 11, pp. 1166-
1179, 1996.
[18] WECC Relay Work Group, "Remedial Action Scheme Design Guide," February
2006.
[19] W. Winter and B. LeReverend, "Operational performance of bulk electricity
system control aids," Electra, N. 123, March 1989.
[20] J. D. McCalley and F. Weihui, "Reliability of special protection systems," IEEE
Transactions on Power Systems, vol. 14, pp. 1400-1406, 1999.
[21] NERC reliability standards. (16 Mar). Protection and control. Available:
http://www.nerc.net/standardsreports/standardssummary.aspx
[22] ISA, "Safety Instrumented Functions (SIF) - Safety Integrity Level (SIL)
Evaluation Techniques," 17 June 2002.
[23] Wikipedia. (16 Mar). Available: http://en.wikipedia.org/wiki/Spurious_trip_level
[24] K. Harker, "The north wales supergrid special protection schemes," Electronics
and Power, vol. 30, pp. 719-724, 1984.
[25] M. Panteli, P. A. Crossley, and J. Fitch, "Quantifying the reliability level of
system integrity protection schemes," IET Generation, Transmission &
Distribution, vol. 8, pp. 753-764, 2014.
[26] D. Miller, R. Schloss, and S. Manson, "Pacificorp’s Jim Bridge RAS: A dual
triple modular redundant case study," Mar. 2, 2009.
[27] K. Baskin, M. Thompson, and L. Lawhead, "Design and testing of a system to
classify faults for a generation-shedding RAS," in 2009 62nd Annual Conference
for Protective Relay Engineers, 2009, pp. 140-149.
[28] J. Wen, W. H. E. Liu, P. L. Arons, and S. K. Pandey, "Evolution Pathway
Towards Wide Area Monitoring and Protection—A Real-World
Implementation of Centralized RAS System," IEEE Transactions on Smart Grid,
vol. 5, pp. 1506-1513, 2014.
[29] IEC 61850 communication networks and systems in substations Specific
Communication Service Mapping (SCSM) - Mapping to MMS (ISO 9506-1 and
ISO 9506-2) and to ISO/IEC 8802-3, pt. 8-1. Available: http://www.iec.ch
[30] I. T. R. 61850-90-5:2012, "Communication networks and systems for power
utility automation - Part 90-5: Use of IEC 61850 to transmit synchrophasor
information according to IEEE C37.118," 2012.
[31] All Island Transmission System Map. Available:
http://smartgriddashboard.eirgrid.com/#all/transmission-map
[32] V. Terzija, G. Valverde, D. Cai, P. Regulski, V. Madani, J. Fitch, S. Skok, M. M.
Begovic, and A. Phadke, "Wide-Area Monitoring, Protection, and Control of
Future Electric Power Networks," Proceedings of the IEEE, vol. 99, pp. 80-93,
2011.
[33] S. Tamronglak, S. H. Horowitz, A. G. Phadke, and J. S. Thorp, "Anatomy of
power system blackouts: preventive relaying strategies," IEEE Transactions on
Power Delivery, vol. 11, pp. 708-715, 1996.
[34] S. H. Horowitz, A. G. Phadke, and J. S. Thorpe, "Adaptive transmission system
relaying," IEEE Transactions on Power Delivery, vol. 3, pp. 1436-1445, 1988.
[35] K. Chul-Hwan, H. Jeong-Yong, and R. K. Aggarwal, "An enhanced zone 3
algorithm of a distance relay using transient components and state diagram,"
IEEE Transactions on Power Delivery, vol. 20, pp. 39-46, 2005.
References
Page | 224
[36] S. Sheng, K. K. Li, W. L. Chan, X. Zeng, D. Shi, and X. Duan, "Adaptive Agent-
Based Wide-Area Current Differential Protection System," IEEE Transactions on
Industry Applications, vol. 46, pp. 2111-2117, 2010.
[37] M. Begovic, D. Novosel, D. Karlsson, C. Henville, and G. Michel, "Wide-Area
Protection and Emergency Control," Proceedings of the IEEE, vol. 93, pp. 876-
891, 2005.
[38] IEC 61850 Communication networks and systems in substations—Use of IEC
61850 for the communication between substations pt. 90–1. Available:
http://www.iec.ch
[39] J. Sykes, M. Adamiak, and G. Brunello, "Implementation and Operational
Experience of a Wide Area Special Protection Scheme on the SRP System," in
2006 Power Systems Advanced Metering, Protection, Control, Communication,
and Distributed Resources, 2006, pp. 145-158.
[40] M. G. Adamiak, A. P. Apostolov, M. M. Begovic, C. F. Henville, K. E. Martin, G.
L. Michel, A. G. Phadke, and J. S. Thorp, "Wide Area
Protection—Technology and Infrastructures," IEEE Transactions on Power
Delivery, vol. 21, pp. 601-609, 2006.
[41] Y. Wang, W. Li, and J. Lu, "Reliability Analysis of Wide-Area Measurement
System," IEEE Transactions on Power Delivery, vol. 25, pp. 1483-1491, 2010.
[42] Alstom Grid, Network Protection & Automation Guide: Protective Relays,
Measurement & Control, May 2011.
[43] IEC 61850, "Communication networks and systems in substations," Institute of
Electrical and Electronics Engineers, Tech. Rep., 2002-2005.
[44] P. Zhang, L. Portillo, and M. Kezunovic, "Reliability and Component Importance
Analysis of All-Digital Protection Systems," in 2006 IEEE PES Power Systems
Conference and Exposition, 2006, pp. 1380-1387.
[45] P. Leupp and C. Rytoft, "Special Report IEC 61850," ABB.
[46] J. Wen, C. Hammond, and E. A. Udren, "Wide-area Ethernet network
configuration for system protection messaging," in 2012 65th Annual Conference
for Protective Relay Engineers, 2012, pp. 52-72.
[47] "IEEE Standard Communication Delivery Time Performance Requirements for
Electric Power Substation Automation," IEEE Std 1646-2004, pp. 0_1-24, 2005.
[48] K.P. Brand, C. Brunner, and W. Wimmer, "Design of IEC61850 based Substation
Automation System according to Customer Requirements," CIGRE Plenary
meeting, Session of SC B5, Paper B5-103, Paris, 2004.
[49] G. Antonova, L. Frisk, and J. C. Tournier, "Communication redundancy for
substation automation," in 2011 64th Annual Conference for Protective Relay
Engineers, 2011, pp. 344-355.
[50] International Electrotechnical Commission IEC 62439-3, "Industrial
communication networks - High availability automation networks - Part 3:
Parallel Redundancy Protocol (PRP) and High-availability Seamless Redundancy
(HSR)," 2016.
[51] B.Kasztenny, J.Whatley, and E.A.Udren, "IEC 61850: A Practical Application
Primer for Protection Engineers," 60th Annual Georgia Tech Protective Relaying
Conference, Atlanta, GA, , May 3-5, 2006.
[52] H. Hajian-Hoseinabadi, "Availability Comparison of Various Power Substation
Automation Architectures," IEEE Transactions on Power Delivery, vol. 28, pp.
566-574, 2013.
References
Page | 225
[53] H. Hajian-Hoseinabadi and M. E. H. Golshan, "Availability, Reliability, and
Component Importance Evaluation of Various Repairable Substation Automation
Systems," IEEE Transactions on Power Delivery, vol. 27, pp. 1358-1367, 2012.
[54] L. R. C. Ferreira, P. A. Crossley, J. Goody, and R. N. Allan, "Reliability
evaluation of substation control systems," IEE Proceedings - Generation,
Transmission and Distribution, vol. 146, pp. 626-632, 1999.
[55] L. Andersson, K. P. Brand, C. Brunner, and W. Wimmer, "Reliability
investigations for SA communication architectures based on IEC 61850," in 2005
IEEE Russia Power Tech, 2005, pp. 1-7.
[56] "IEEE Recommended Practice for Design of Reliable Industrial and Commercial
Power Systems (IEEE Gold Book)," IEEE Standard 493-2007, 2007.
[57] IEEE RFC 3376, "Internet Group Management Protocol, Version 3," October,
2002.
[58] B. Beresh and B. Machie, "I22: End-Of-Life Assessment of P&C Devices," PSRC
Working Group, May 2015.
[59] P. J. Smith, M. Shafi, and G. Hongsheng, "Quick simulation: a review of
importance sampling techniques in communications systems," IEEE Journal on
Selected Areas in Communications, vol. 15, pp. 597-613, 1997.
[60] H. Maciejewski, G. J. Anders, and J. Endrenyi, "On the use of statistical methods
and models for predicting the end of life of electric power equipment," in 2011
International Conference on Power Engineering, Energy and Electrical Drives,
2011, pp. 1-6.
[61] V. I. Kogan, J. A. Fleeman, J. H. Provanzana, and C. H. Shih, "Failure analysis of
EHV transformers," IEEE Transactions on Power Delivery, vol. 3, pp. 672-683,
1988.
[62] M. T. Schilling, J. C. G. Praca, J. F. d. Queiroz, C. Singh, and H. Ascher,
"Detection of ageing in the reliability analysis of thermal generators," IEEE
Transactions on Power Systems, vol. 3, pp. 490-499, 1988.
[63] B. Retterath, S. S. Venkata, and A. A. Chowdhury, "Impact of time-varying
failure rates on distribution reliability," in 2004 International Conference on
Probabilistic Methods Applied to Power Systems, 2004, pp. 953-958.
[64] L. Wenyuan, "Evaluating mean life of power system equipment with limited end-
of-life failure data," in IEEE Power Engineering Society General Meeting, 2005,
2005, p. 2390 Vol. 3.
[65] J. E. Cota-Felix, F. Rivas-Davalos, and S. Maximov, "An alternative method for
estimating mean life of power system equipment with limited end-of-life failure
data," in 2009 IEEE Bucharest PowerTech, 2009, pp. 1-4.
[66] R. L. M. Auterei, T. Rahman, A. Wen, D. Zee, P. J. Tavener, "Investigation on
Ageing and Life Extension of Protective Relays," Dec 2009 – Nov 2011.
[67] N. Grid, "SHNB, THR and LFCB Relay Age Profile (internal documents)," 2013.
[68] N. Grid, "FAULTS-2000-2013 –PROTN Malops THR-SHNB-LFCB," 2014.
[69] N. Grid, "Thermal Overload Capabilities of Protection Equipment," TGN(E) 66,
issue 3, September 2000.
[70] N. L. P. Crossley, B. Gwyn, et al, "Asset Life Extension Evaluation – SHNB,"
Quanta Technology, LLC for National Grid, February 2015.
[71] N. L. P. Crossley, B. Gwyn, et al, "Asset Life Extension Evaluation – THR,"
Quanta Technology, LLC for National Grid, February 2015.
References
Page | 226
[72] N. L. P. Crossley, B. Gwyn, et al, "Asset Life Extension Evaluation – LFCB,"
Quanta Technology, LLC for National Grid, February 2015.
[73] P. A. Agyakwa, L. Yang, M. R. Corfield, and C. M. Johnson, "A non-destructive
study of crack development during thermal cycling of Al wire bonds using x-ray
computed tomography," in CIPS 2014; 8th International Conference on
Integrated Power Electronics Systems, 2014, pp. 1-5.
[74] F. Weihui, Z. Sanyi, J. D. McCalley, V. Vittal, and N. Abi-Samra, "Risk
assessment for special protection systems," in 2002 IEEE Power Engineering
Society Winter Meeting. Conference Proceedings (Cat. No.02CH37309), 2002, p.
740 vol.2.
[75] M. Esmaili, A. Hajnoroozi, and H. Shayanfar, "Risk Evaluation of Online Special
Protection Systems," International Journal of Electrical Power & Energy Systems,
vol. 41, pp. 137-144, 2012.
[76] T. Y. Hsiao and C. N. Lu, "Risk Informed Design Refinement of a Power System
Protection Scheme," IEEE Transactions on Reliability, vol. 57, pp. 311-321, 2008.
[77] J. L. C. d. Miguel, P. J. Ramírez, S. H. Tindemans, and G. Strbac, "Cost-benefit
analysis of unreliable System Protection Scheme operation," in 2015 IEEE
Eindhoven PowerTech, 2015, pp. 1-6.
[78] J. L. Calvo, S. H. Tindemans, and G. Strbac, "Managing risks from reverse flows
under distribution network outage scenarios," in IET International Conference on
Resilience of Transmission and Distribution Networks (RTDN) 2015, 2015, pp. 1-
6.
[79] C. Shipman, K. Hopkinson, and J. Lopez, "Con-resistant trust for improved
reliability in a smart grid special protection system," in 2015 IEEE Power &
Energy Society General Meeting, 2015, pp. 1-1.
[80] R. Billinton and A. Sankarakrishnan, "A comparison of Monte Carlo simulation
techniques for composite power system reliability assessment," in IEEE
WESCANEX 95. Communications, Power, and Computing. Conference
Proceedings, 1995, pp. 145-150 vol.1.
[81] UK National Grid. (2016). Electricity Ten Year Statement 2016. Available:
http://www2.nationalgrid.com/UK/Industry-information/Future-of-
Energy/Electricity-Ten-Year-Statement/
[82] Int. Electrotech. Comm., "Communication networks and systems in substations -
Part 9-2: Specific Communication Service Mapping (SCSM) - Sampled values
over ISO/IEC 8802-3," ed, 2004.
[83] V. Madani, E. Taylor, D. Erwin, A. Meklin, and M. Adamiak, "High-Speed
Control Scheme to Prevent Instability of A Large Multi-Unit Power Plant," in
2007 60th Annual Conference for Protective Relay Engineers, 2007, pp. 271-282.
[84] R. Billinton and R. N. Allan, Reliability Evaluation of Engineering Systems, 1983.
[85] C. Grigg, P. Wong, P. Albrecht, R. Allan, M. Bhavaraju, R. Billinton, Q. Chen, C.
Fong, S. Haddad, S. Kuruganty, W. Li, R. Mukerji, D. Patton, N. Rau, D. Reppen,
A. Schneider, M. Shahidehpour, and C. Singh, "The IEEE Reliability Test
System-1996. A report prepared by the Reliability Test System Task Force of the
Application of Probability Methods Subcommittee," IEEE Transactions on
Power Systems, vol. 14, pp. 1010-1020, 1999.
[86] E. Leahy and R. S. J. Tol, "An Estimate of the Value of Lost Load for Ireland,"
ESRI Working Paper 357, 2010.
References
Page | 227
[87] Electricity Networks Strategy Group, "Our Electricity Transmission Network: A
Vision for 2020," Technical Report URN: 09D/717, July 2009.
[88] UK National Grid. (2016). System Operability Framework 2016. Available:
http://www2.nationalgrid.com/UK/Industry-information/Future-of-
Energy/System-Operability-Framework/
[89] G. Sinden, "Characteristics of the UK wind resource: Long-term patterns and
relationship to electricity demand," Energy Policy, vol. 35, pp. 112–117, 2007.
[90] A. J. Roscoe and G. Ault, "Supporting high penetrations of renewable generation
via implementation of real-time electricity pricing and demand response," IET
Renewable Power Generation, vol. 4, pp. 369-382, 2010.
[91] UK National Grid. (2016). Network Options Assessment. Available:
http://www2.nationalgrid.com/UK/Industry-information/Future-of-
Energy/Network-Options-Assessment/
[92] P. Glynn and W. Whitt, "The asymptotic validity of sequential stopping rules for
stochastic simulations," Ann. Appl. Probab., vol. 2, pp. 180–198, 1992.
[93] University of reading, "Documentation for wind profile program."
[94] R. Billinton, C. Hua, and R. Ghajar, "A sequential simulation technique for
adequacy evaluation of generating systems including wind energy," IEEE
Transactions on Energy Conversion, vol. 11, pp. 728-734, 1996.
[95] Vestas, "V90-e.0MW Turbine."
[96] J. McCalley, O. Oluwaseyi, V. Krishnan, R. Dai, C. Singh, and K. Jiang, "System
Protection Schemes: Limitations, Risks, and Management," PSERC Publications,
December 2010.
[97] O. Olatujoye, V. Krishnan, and J. McCalley, "Including special protection
schemes and operational complexity within transmission planning," Power and
Energy Society General Meeting, 2011.
[98] F. Li and R. Bo, "Small test systems for power system economic studies," in
IEEE PES General Meeting, 2010, pp. 1-4.
[99] F. Weihui, Z. Sanyi, J. D. McCalley, V. Vittal, and N. Abi-Samra, "Risk
assessment for special protection systems," IEEE Transactions on Power Systems,
vol. 17, pp. 63-72, 2002.
[100] A. H. Mohsenian-Rad, V. W. S. Wong, J. Jatskevich, R. Schober, and A. Leon-
Garcia, "Autonomous Demand-Side Management Based on Game-Theoretic
Energy Consumption Scheduling for the Future Smart Grid," IEEE Transactions
on Smart Grid, vol. 1, pp. 320-331, 2010.
[101] C. M. D. Dominicis, P. Ferrari, A. Flammini, S. Rinaldi, and M. Quarantelli, "On
the Use of IEEE 1588 in Existing IEC 61850-Based SASs: Current Behavior and
Future Challenges," IEEE Transactions on Instrumentation and Measurement,
vol. 60, pp. 3070-3081, 2011.
[102] Y. Li and P. A. Crossley, "Voltage balancing in low-voltage radial feeders using
Scott transformers," IET Generation, Transmission & Distribution, vol. 8, pp.
1489-1498, 2014.
[103] P. Palensky and D. Dietrich, "Demand Side Management: Demand Response,
Intelligent Energy Systems, and Smart Loads," IEEE Transactions on Industrial
Informatics, vol. 7, pp. 381-388, 2011.
Page | 228
Appendix A: Protection Fingerprint Testing
A.1 Line Parameters and Protection Settings
The settings of each relay type and the circuit parameters of their protected lines are
provided by the National Grid based on the 400kV transmission systems. The fingerprint
is then performed based on the following parameters.
1) SHNB:
Line Parameters:
Line length: 81 km
Current Transformer Ratio: 2000/1
Voltage Transformer Ratio: 400k/110
Line Impedance (% on 100MVA): Z1 = 0.1718 + j1.5709 Z0 = 0.7282 + j4.3201
Z0m = 0.5564 + j2.3319
SHNB MicroMho Relay Settings:
Vn = 110V (Voltage rating at secondary side)
In = 1A (Current rating at secondary side)
Residual Compensation Factor (R.C.F) = 0.538
Relay Character Angle (RCA) = 85 DEG
Scheme Selection: X = 1, Y = 1
Zone Settings:
Zone 1: Forward reach = 11.52 Ω sec 80% Trip time: 0 s
Zone 2: Forward reach = 21.6 Ω sec 150% Trip time: 500 ms
Zone 3: Forward reach = 28.8 Ω sec 200%
Reverse Reach = 2.4 Ω sec -16.5% Trip time: 1 s
2) THR:
Line Parameters:
Line length: 51 km
Current Transformer Ratio: 2000/1
Voltage Transformer Ratio: 400k/110
Line Impedance (% on 100MVA): Z1 = 0.055 + j0.8534 Z0 = 0.3306 + j2.5346
Appendix
Page | 229
Z0m = 0.2745 + j1.3521
THR Relay Settings:
Vn = 110V (Voltage rating at secondary side)
In = 2A (Current rating at secondary side)
Residual Compensation Factor (R.C.F) = 0.627
Relay Character Angle (RCA) = 75 DEG
Zone Settings:
Zone 1: Forward reach = 6.12 Ω sec 81.4% Trip time: 0s
Zone 2: Forward reach = 11.63 Ω sec 155% Trip time: 500 ms
Zone 3: Forward reach = 19.58 Ω sec 260%
Reverse Reach = 1.96 Ω sec 26.1% Trip time: 1 s
3) LFCB:
Protection Settings:
Lower Slope Threshold (IS1): 0.20 p.u.
Lower Slope % Bias (k1): 30%
Upper Slope Threshold (IS2): 2.00 p.u.
Upper Slope % Bias (k2): 150%
Permissive Intertrip Time (PIT): 0 sec
Communications Settings:
Comms. Channel Delay Tolerance = 250 microsec
Comms. Channel Failure Alarm Time = 9.9 sec
Relay Address = 1-A
Serial Port Settings:
Baud Rate: 4800 baud
Bit Framing Format: Data bits: 8 bits
Parity: none
Stop bits: 1 bit
Remote Access Level: Limited
Appendix
Page | 230
Scheme Logic Settings:
Block Auto-reclose Mode: PIT & 3PH Fault
Tripping Mode: Three-pole
Configuration: 2 ended
Time Synchronization: Time Sync. Period: 30 min
Appendix
Page | 231
A.2 PSCAD model used for dynamic fault based distance relay testing
Figure A-1: PSCAD Model used for Dynamic Fault based Distance Relay Testing
Appendix
Page | 232
Appendix B: Vulnerable Components Assessment
B.1 Vulnerable Components Examined via X-ray Tomography
Relay
Type Component
Module
(function)
Relay
(Type/Serial
No)
Relay
age
(years)
Operate
tem./ °C
SHNB
1
HMOS single component
microcontroller (plastic
encapsulated)
P8039AHL L5222957
(INTEL 1977)
16
(Microprocessor) SHNB102 17 33C
2
Voltage regulator (plastic
encapsulated)
MC T7805CT
21/23/25
(Comparator) SHNB102 17 32C
3
JFET input operational
amplifier
M53AK LF355H
30 (Voltage input) SHNB102 17 34C
4 Operational amplifier
UA741CN
13 (Voltage
supervision) SHNB101 7-8 -
5 Operational amplifier
UA741CN
13 (Voltage
supervision) SHNB102 17 33C
6 Operational amplifier
UA741CN
18 (Phase &
Neutral) SHNB102 17 27C
7 Voltage regulator (metal
can package)
21/23/25
(Comparator) 22 29C
8 Voltage regulator (metal
can package)
21/23/25
(Comparator) SHNB101 7-8 -
9 Small signal diode
BAV21
13 (Voltage
supervision) SHNB102 17 33C
THR
1
TO-18 metal case
transistor
2N2906
Power Supply
UV-OV-OC
CARD (T4)
THR 32 ≤ 25C
2
TO-18 metal case
transistor
2N2222A
Earth Fault Box
V.T.S Module
(T45)
THR 32 ≤ 25C
3 Zener diode
BZY88C
Phase Fault Box
Z1B-R
comparator (D6)
THR 32 ≤ 29C
4 Film Resistor
2.2 k (±5%) 2W
Power Supply
Output Regulator
Card (R22)
THR 32 49C
LFCB
1 Voltage regulator (plastic)
HM91AR LM940T
4 (GM0052021)
(Communication
controller board)
LFCB
192(102)
547373C
15 41C
2 Voltage regulator (plastic)
7812CT
4 (GM0052021)
(Communication
controller board)
LFCB
192(102)
547373C
15 35C
3
Enhanced serial
communications
controller (ESCC)
AM85C30-8JC
4 (GM0052021) LFCB192(102)
547373C 15 33C
4 Voltage regulator (plastic)
4 (GM0052021)
(Communication
interface board)
LFCB103
208284J 8 -
5 Voltage regulator (plastic)
7812CT
4 (GM0052021)
(Controller board)
LFCB103
208284J 8 -
Appendix
Page | 233
B.2 Component Degradation Mechanisms
Degradation Mechanisms of Transistor Packages
The anatomies of generic single chip transistor packages are represented in Figure B-1.
It depicts a monolithic integrated circuit, with a centrally positioned leadframe pad or
substrate upon which the semiconductor die is attached. Bond wires provide
interconnections between the die and the leads, and a polymeric mould compound holds
the assembly together, prevents the ingress of moisture and dust, and provides
insulating dielectric properties.
Figure B-1: Typical Plastic Encapsulated Transistor Package
During the operation of a transistor package, heat is generated within the semiconductor
chip(s) due to switching and conduction losses and this heat must be removed (i.e.
transferred to the ambient air) as efficiently as possible to maximise the electrical
performance and mechanical reliability of the component. Heat is conducted away
through the package leadframe, allowing the component to remain within its optimum
operation temperature limits. The lower the junction temperature of the device, the more
reliably the module will function. A pathway with low thermal impedance must be
created from the device level to a point where the heat may be dissipated safely without
damage to the circuit. The following reliability considerations need to be assessed:
1) Die attachment / solder joint reliability: effectiveness of the thermal path is to a large
extent determined by the die attachment or solder layer between the chip and the
leadframe pad.
Appendix
Page | 234
2) Wire bond reliability: wire bond lift-off is one of the most common causes of failure.
Lift-off is highly undesirable as it obviously leads to loss of electrical
interconnection and impairment or failure of function.
3) Leadframe reliability: moisture ingress through the encapsulant which migrates to
resin-leadframe interfaces can result in delamination inability to withstand voltage
and cause open circuit failures.
Degradation Mechanisms of Electrolytic Capacitor
Electrolytic capacitor technologies provide moderate energy (< 1 kJ/kg) and power
density and are polarity dependent (having distinct positive and negative terminals, and
cannot withstand voltage reversal in excess of 1.5V). Typical applications are moderate
to large capacitors (0.1F to 3F) and voltage ratings from 5 to 500V. The typical
temperature range is limited to about 80 to 105°C due to conduction effects and
reliability concerns. The electrical conductivity of the electrolyte increases as
temperature increases.
Vitalisation of electrolyte from the cylinder at high temperatures leads to degradation in
capacitance and increase in equivalent series resistance (ESR) of ‘wet’ electrolytic
capacitors. The impact of decreased capacitance obviously depends on the application
within which a capacitor is employed.
The stability of the oxide layer (dielectric) of the anode (the oxide layer may deteriorate
under high voltage and high temperature) and the interaction of the anode and cathode
foils with the electrolyte. The stability of the sealing elements (preventing permeation of
the electrolyte solvent through the seals may cause the capacitor to dry out). Within the
context of protection relays, the physical degradation due to temperature strongly
interacts with PCB designs and topologies, and capacitors in close proximity to heat
generating elements such as high power resistors, transformers and IC packages may
wear out faster than others.
Appendix
Page | 235
B.3 Structural Investigation via 3D X-Ray Microtomography
Detailed structural investigation was undertaken of a number of components, selected as
they exhibited above-ambient temperature operation, as these are considered more
susceptible to thermally activated degradation mechanisms. Samples of these
components, mainly transistor/IC packages, were extracted from relays with different
service life history.
Figure B-2: Internal Structure of HMOS Microcontroller Identified as Operating above
Ambient Temperature in SHBM Module 16
Particular attention was paid to die attachments and wire bonds. Signs of packaging-
related damage, i.e. die attachment voiding and cracking were observed. For one
component, this appeared to have progressed in a relay with a 17 year service history
when compared to the degradation observed in relay with a 7-8 year history. It is not
possible to say whether the observed damage was present in the as-manufactured
condition, or whether it evolved during operation. Overall, the damage observed in
components was not extensive. Percentage void area beneath die attachments ranged
from 2.6% to 7.1%. Thus, although a gradual degradation in thermal resistance and
electrical performance is expected over time, under the typically benign ambient
environmental conditions and in the absence of significant temperature cycling,
significant acceleration of the observed degradation mechanisms is unlikely. No signs
of bond wire failure were observed.
Appendix
Page | 236
Figure B-3: Metal can packaged voltage regulator (IC11, Modules 21/23/25, SHNB 101)
Appendix
Page | 237
Appendix C: IEEE Reliability Test System
Load Profile
Table C-1: Weekly Peak Load in Percent of Annual Peak
Week Peak Load Week Peak Load
1 86.20% 27 75.50% 2 90.00% 28 81.60% 3 87.80% 29 80.10% 4 83.40% 30 88.00% 5 88.00% 31 72.20%
6 84.10% 32 77.60% 7 83.20% 33 80.00% 8 80.60% 34 72.90% 9 74.00% 35 72.60%
10 73.70% 36 70.50% 11 71.50% 37 78.00% 12 72.70% 38 69.50% 13 70.40% 39 72.40% 14 75.00% 40 72.40% 15 72.10% 41 74.30% 16 80.00% 42 74.40% 17 75.40% 43 80.00%
18 83.70% 44 88.10% 19 87.00% 45 88.50% 20 88.00% 46 90.90% 21 85.60% 47 94.00% 22 81.10% 48 89.00% 23 90.00% 49 94.20% 24 88.70% 50 97.00% 25 89.60% 51 100.00% 26 86.10% 52 95.20%
Table C-2: Daily Load in Percent of Weekly Peak
Daily Load Peak
Monday 93% Tuesday 100%
Wednesday 98% Thursday 96%
Friday 94% Saturday 77% Sunday 75%
Appendix
Page | 238
Table C-3: Hourly Peak Load in Percent of Daily Peak
Winter Weeks summer time spring-fall
1-8 & 44-52 18-30 9-17 & 31-43
Hourly Peak Load
Wkdy Wknd Wkdy Wknd Wkdy Wknd
1 67% 78% 64% 74% 63% 75% 2 63% 72% 60% 70% 62% 73% 3 60% 68% 58% 66% 60% 69% 4 59% 66% 56% 65% 58% 66% 5 59% 64% 56% 64% 59% 65% 6 60% 65% 58% 62% 65% 65% 7 74% 66% 64% 62% 72% 68%
8 86% 70% 76% 66% 85% 74% 9 95% 80% 87% 81% 95% 83%
10 96% 88% 95% 86% 99% 89% 11 96% 90% 99% 91% 100% 92% 12 95% 91% 100% 93% 99% 94% 13 95% 90% 99% 93% 93% 91% 14 95% 88% 100% 92% 92% 90% 15 93% 87% 100% 91% 90% 90% 16 94% 87% 97% 91% 88% 86% 17 99% 91% 96% 92% 90% 85% 18 100% 100% 96% 94% 92% 88% 19 100% 99% 93% 95% 96% 92%
20 96% 97% 92% 95% 98% 100% 21 91% 94% 92% 100% 96% 97% 22 83% 92% 93% 93% 90% 95% 23 73% 87% 87% 88% 80% 90% 24 63% 81% 72% 80% 70% 85%
Appendix
Page | 239
Appendix D: Reliability Assessment Results
for a system with two SIPS
Table D-1: Reliability Assessment Results for SIPS Interaction
Sys States
Arch1
SNG Process/Station Bus
Arch2
Dup Process/ SNG Station Bus
(voting) (vetoing) (voting) (vetoing)
Normal (Level 0)
S1 N&N Pr 9.68×10-1 9.54×10-1 9.81×10-1 9.65×10-1
1 Maloperation (Level 1)
S2 D&N Pr 3.0×10-2 4.52×10-2 1.78×10-2 5.73×10-2
S3 S&N Fr 1.75×10-6 1.38×10-5 2.07×10-5 1.07×10-5
2 Maloperations (Level 2)
S4 D&D Pr 7.98×10-4 9.87×10-4 6.56×10-4 1.20×10-3
S5 D&S Pr 5.94×10-6 7.03×10-6 4.07×10-6 6.98×10-6
S6 S&S Fr 4.46×10-7 4.38×10-7 4.53×10-7 4.31×10-7
Sys States
Arch3
SNG process/ Dup Station bus
Arch4
Dup process/Station bus
(voting) (vetoing) (voting) (vetoing)
Normal (Level 0)
S1 N&N Pr 9.80×10-1 9.65×10-1 9.94×10-1 9.53×10-1
1 Maloperation (Level 1)
S2 D&N Pr 1.85×10-2 3.37×10-2 5.39×10-3 4.60×10-2
S3 S&N Fr 2.05×10-5 1.67×10-5 2.39×10-5 1.36×10-5
2 Maloperations (Level 2)
S4 D&D Pr 7.03×10-4 8.33×10-4 5.01×10-4 1.00×10-3
S5 D&S Pr 4.21×10-6 6.30×10-6 1.40×10-6 7.06×10-6
S6 S&S Fr 4.52×10-7 4.44×10-7 4.60×10-7 4.37×10-7