reliability assessment of a system integrity protection

RELIABILITY ASSESSMENT OF A

SYSTEM INTEGRITY PROTECTION

SCHEME FOR TRANSMISSION

NETWORKS

A thesis submitted to The University of Manchester for the degree of

Doctor of Philosophy

in the Faculty of Science and Engineering

2017

by

Nan Liu

School of Electrical and Electronic Engineering

Page | 1

Contents

Contents ........................................................................................................................... 1

List of Figures .................................................................................................................. 6

List of Tables ................................................................................................................. 10

List of Abbreviations .................................................................................................... 12

Abstract .......................................................................................................................... 15

Declaration ..................................................................................................................... 16

Copyright Statement ..................................................................................................... 17

Acknowledgement ......................................................................................................... 18

Publications .................................................................................................................... 19

CHAPTER 1

INTRODUCTION ......................................................................................................... 20

1.1. Power System Reliability ................................................................................................ 20

1.2. Project Motivation and Objectives .................................................................................. 24

1.3. Contributions ................................................................................................................... 27

1.4. Outline of the Thesis ........................................................................................................ 29

CHAPTER 2

RELIABILITY OF SYSTEM INTEGRITY PROTECTION SCHEME ................ 32

2.1. Introduction of System Integrity Protection Scheme ....................................................... 32

2.1.1. SIPS Applications .................................................................................................... 34

2.1.2. SIPS Classification ................................................................................................... 36

2.2. SIPS Design Consideration ............................................................................................. 38

2.2.1. Initiating Conditions ................................................................................................. 39

2.2.2. Time Requirements .................................................................................................. 39

2.2.3. Redundancy Consideration ...................................................................................... 40

2.3. SIPS: Industry Experience ............................................................................................... 41

Contents

Page | 2

2.3.1. SIPS Applications and Maloperations ...................................................................... 41

2.3.2. SIPS Reliability Criteria ........................................................................................... 44

2.4. Existing SIPS Applications ............................................................................................. 46

2.4.1. Dinorwig Intertrip Scheme ....................................................................................... 46

2.4.2. PacifiCorp’s Jim Bridger RAS ................................................................................. 49

2.4.3. Southern California Edison Centralised RAS .......................................................... 51

2.5. Review of Major SIPS Maloperations ............................................................................. 54

2.5.1. Irish System Disturbance, 5th August 2005 .............................................................. 54

2.5.2. SIPS Maloperation in Nordic Grid, 1st of December 2005....................................... 56

2.6. Summary .......................................................................................................................... 59

CHAPTER 3

ASSESSING THE IMAPCT OF ICT RELIABILITY ON SIPS APPLICATION 61

3.1. The Role of ICT in Power System Protection ................................................................. 61

3.1.1. Impact of ICT on Power System Protection ............................................................. 62

3.1.2. Impact of ICT on SIPS ............................................................................................. 64

3.2. Communication Infrastructure of SIPS ........................................................................... 65

3.2.1. General SIPS Communication Infrastructures ......................................................... 65

3.2.2. Wide Area Communication Network ....................................................................... 67

3.2.3. Substation Automation System ................................................................................ 68

3.2.4. Centralised SIPS: Speed Requirement ..................................................................... 69

3.3. IEC 61850 based Substation Automation System and its Reliability Model .................. 71

3.3.1. IEC 61850 based Substation Station Bus Architectures........................................... 72

3.3.2. IEC 61850-9-2 based Process Bus Architectures ..................................................... 75

3.3.3. Reliability Model of the Substation Automation System ......................................... 76

3.3.4. Reliability Data ........................................................................................................ 80

3.4. Reliability Assessment of SAS Communication Services ............................................... 81

3.4.1. Reliability of Two-terminal Communication ........................................................... 82

3.4.2. Reliability of Multi-Terminal Communication ........................................................ 85

3.4.3. Sensitivity Analysis .................................................................................................. 89

3.5. Summary .......................................................................................................................... 91

CHAPTER 4

PROTECTION AND CONTROL ASSET END-OF-LIFE ANALYSIS ................. 93

Contents

Page | 3

4.1. Introduction ..................................................................................................................... 93

4.1.1. Literature Review on End-of-Life Assessment ........................................................ 95

4.1.2. Asset Life Extension (ALE) Project Test Process .................................................... 99

4.1.3. Benefits and Risks of Asset Life Extension ........................................................... 101

4.2. UK National Grid Asset Life Extension Project ............................................................ 102

4.2.1. National Grid Protection and Control Asset Life Extension (ALE) Project .......... 102

4.2.2. Asset Life Extension (ALE) Study of Selected Protection Relays ......................... 103

4.2.3. Relay Defect Data Analysis ................................................................................... 106

4.2.4. Environment Influence ........................................................................................... 107

4.3. Laboratory Evaluation Results on Selected Relays ....................................................... 108

4.3.1. Laboratory Inspection ............................................................................................ 109

4.3.2. Fingerprint Performance Testing............................................................................ 112

4.3.3. Stress Testing-Simulated In-service Conditions .................................................... 120

4.3.4. In-Depth Evaluation of Modules and Components ................................................ 122

4.3.5. System Level Failure Mode, Mechanism and Effect Analysis .............................. 129

4.4. Conclusions and Future Works ...................................................................................... 129

4.4.1. Recommendations .................................................................................................. 130

4.4.2. Further Work and Application to other Equipment Types ..................................... 132

4.5. Summary ........................................................................................................................ 132

CHAPTER 5

RISK ASSESSMENT OF A SYSTEM INTEGRITY PROTECTION SCHEME 134

5.1. Literature Review of SIPS Reliability Assessment Method .......................................... 134

5.2. SIPS Risk Assessment Procedures ................................................................................ 136

5.2.1. Reliability Assessment ........................................................................................... 137

5.2.2. Impact Assessment ................................................................................................. 139

5.2.3. Risk Assessment ..................................................................................................... 140

5.3. SIPS Communication Infrastructure Modelling ............................................................ 140

5.3.1. Introduction of Studied SIPS Communication Architectures ................................ 140

5.3.2. Communication System Modelling ........................................................................ 142

5.4. SIPS Reliability Assessment ......................................................................................... 144

5.4.1. Failure Mode and Effect Analysis .......................................................................... 144

5.4.2. Markov Modelling .................................................................................................. 146

5.4.3. Reliability Block Diagram...................................................................................... 148

Contents

Page | 4

5.4.4. Reliability Assessment Results .............................................................................. 149

5.5. Risk Assessment Numerical Illustration: Analytical Method ........................................ 150

5.5.1. GRS Operating Logic ............................................................................................. 150

5.5.2. Analytical Risk Assessment Procedures ................................................................ 151

5.5.3. Analytical Risk Assessment Results ...................................................................... 155

5.6. Sensitivity Study ............................................................................................................ 156

5.6.1. Impact of Component Reliability on GRS Risk ..................................................... 157

5.6.2. Impact of System Conditions on GRS Risk ........................................................... 159

5.7. Summary ........................................................................................................................ 161

CHAPTER 6

RISK OF IMPLEMENTING SIPS IN A SYSTEM WITH LARGE-SCALE WIND

INTEGRATION .......................................................................................................... 162

6.1. Future UK Power System .............................................................................................. 162

6.1.1. Future Energy Scenarios and Wind Generation ..................................................... 163

6.1.2. Load Profiles .......................................................................................................... 164

6.1.3. Transmission Line Reinforcements ........................................................................ 165

6.2. Stochastic Risk Assessment Procedures ........................................................................ 166

6.3. System Condition Time-series Model ........................................................................... 170

6.3.1. Wind Forecast Model ............................................................................................. 170

6.3.2. Power System Load Profile .................................................................................... 172

6.4. Numerical Illustration of Stochastic SIPS Risk Assessment ......................................... 173

6.5. Stochastic Risk Assessment Results .............................................................................. 175

6.6. Comparison between Local GRS and System Wide GRS ............................................. 178

6.7. Impact of Variation in Wind Level on Risk Assessment Results .................................. 179

6.8. Summary ........................................................................................................................ 181

Chapter 7

MANAGING THE RISK OF SIPS IN POWER SYSTEM LONG-TERM

PLANNING ................................................................................................................. 183

7.1. Introduction of electric system planning with SIPS ...................................................... 183

7.1.1. Electric system long-term planning with SIPS ....................................................... 184

7.1.2. Challenges in SIPS Coordination ........................................................................... 185

Contents

Page | 5

7.2. Risk Assessment Methodologies Considering SIPS Interaction ................................... 187

7.2.1. Description of the System-level Multi-state Markov Model .................................. 188

7.2.2. Modified Impact Assessment Procedure ................................................................ 191

7.3. Method Numerical Illustration ...................................................................................... 192

7.3.1. Reliability Assessment Results .............................................................................. 194

7.3.2. Impact Assessment Results .................................................................................... 196

7.3.3. Risk Assessment Results ........................................................................................ 198

7.4. System Planning Incorporating SIPS............................................................................. 201

7.5. Sensitivity Study ............................................................................................................ 203

7.6. Managing the SIPS Risk Using Adaptive SIPS ............................................................. 205

7.7. Summary ........................................................................................................................ 209

CHAPTER 8

CONCLUSIONS AND FUTURE WORK ................................................................ 212

8.1. Introduction ................................................................................................................... 212

8.2. Conclusions ................................................................................................................... 213

8.3. Future Work ................................................................................................................... 219

References .................................................................................................................... 222

Appendix A: Protection Fingerprint Testing ........................................................... 228

A.1 Line Parameters and Protection Settings ....................................................................... 228

A.2 PSCAD model used for dynamic fault based distance relay testing .............................. 231

Appendix B: Vulnerable Components Assessment .................................................. 232

B.1 Vulnerable Components Examined via X-ray Tomography .......................................... 232

B.2 Component Degradation Mechanisms ........................................................................... 233

B.3 Structural Investigation via 3D X-Ray Microtomography ............................................. 235

Appendix C: IEEE Reliability Test System Load Profile ....................................... 237

Appendix D: Reliability Assessment Results for a system with two SIPS ............. 239

Word count: 64,761

Page | 6

List of Figures

Figure 2-1: General Structure of System Integrity Protection Scheme........................... 34

Figure 2-2: SIPS Design Process .................................................................................... 39

Figure 2-3: System Integrity Protection Scheme Typical Operating Times ................... 40

Figure 2-4: SIPS Maloperations and Causes from 2000 to 2009 NERC Reports ........... 44

Figure 2-5: One Line Diagram of North Wales Supergrid ............................................. 47

Figure 2-6: Line Outage Detection Logic used in GRS .................................................. 49

Figure 2-7: Geographic Overview of PacifiCorp’s Jim Bridger Transmission System . 50

Figure 2-8: Jim Bridger RAS Triple Modular Redundant (TMR) System ..................... 51

Figure 2-9: The Existing and Forecasted RASs in SCE’s Service Territory .................. 52

Figure 2-10: SCE CRAS High-level Network Architecture ........................................... 53

Figure 2-11: The Ireland Transmission System Map ...................................................... 55

Figure 2-12: Frequency change during Irish Disturbance on 5th August 2005 ............... 56

Figure 2-13: Nordic Grid and the Protection Schemes ................................................... 58

Figure 3-1: Supervision of Backup Relays to Prevent Zone3 Maloperation .................. 63

Figure 3-2: General SIPS Architecture with Central Processors .................................... 66

Figure 3-3: WAN SONET Architecture ......................................................................... 67

Figure 3-4: Substation Automation Architecture from Hardwire to IEC 61850 ............ 69

Figure 3-5: Time Breakdown of a Time-Critical SIPS Application ............................... 70

Figure 3-6: Star (left) & Ring (right) Type SCN Architectures ...................................... 73

Figure 3-7: Example of IEC 62439-3 HSR Network ...................................................... 73

Figure 3-8: Redundant Double-Star (left) & Double-Ring (right) SCN Architectures .. 74

Figure 3-9: Two Process Bus Sensor Network Architectures ......................................... 76

Figure 3-10: Basic Two-Component System in (a) Series and (b) Parrallel................... 77

Figure 3-11: 4-State Markov Model ............................................................................... 78

Figure 3-12: Reliability Block Diagram of different SAS Architectures for Reporting

Service ............................................................................................................................. 83

Figure 3-13: MTTF & Cost of Considered SCN Architectures ...................................... 84

Figure 3-14: Breaker Failure Protection for Different Station Arrangements ................ 86

Figure 3-15: Communication Path of Arch7 for Distributed Function ........................... 87

Figure 3-16: Reliability of SAS to Perform Multi-Terminal Communications ............. 88

List of Figures

Page | 7

Figure 3-17: Impact of MTTF on System Unreliability for Arch 1 ................................ 90

Figure 4-1: Bathtub Curve for End-of-life Assessment .................................................. 95

Figure 4-3: ALE Project Investigation Process ............................................................. 101

Figure 4-2: UK National Grid Relay Age Distribution (by the end of 2014) ............... 105

Figure 4-4: SHNB Relay (left) and Its Comparator Module PCB (right) ..................... 110

Figure 4-5: THR Relay (left) and Its Internal PCBs (right) .......................................... 111

Figure 4-6: LFCB Relay (left) and Its Internal PCBs (right) ........................................ 112

Figure 4-7: Static Fault based Distance Relay Testing in Omicron ‘Distance Relay’

Module .......................................................................................................................... 113

Figure 4-8: LFCB Dual Slope Bias Characteristics ...................................................... 114

Figure 4-9: Connections for LFCB Bias Charateristic Testing ..................................... 115

Figure 4-10: Thermal Images and Components within LFCB Power Supply Module 124

Figure 4-11: Thermal Images on Components within Modules 2&3 (Relay Outputs 1&2)

....................................................................................................................................... 124

Figure 4-12: X-Ray Tomography Images of LFCB Voltage Regulator IC14, Module 4:

....................................................................................................................................... 127

Figure 4-13: X-Ray Tomography Images of Voltage Regulator IC23, 15-year Old Relay

....................................................................................................................................... 127

Figure 4-14: Acoustic Microscopy Images showing the Evolution of Degradation in a

TO-220 Package Die Attachment during Thermal Cycling .......................................... 128

Figure 5-1: SIPS Reliability Assessment Procedures ................................................... 137

Figure 5-2: Protection and Communication Architecture of a GRS. ............................ 141

Figure 5-3: RBD to Assess the Depededability (a) and Security (b) of the Substation

Sensor Network ............................................................................................................. 143

Figure 5-4: Communication Path for Multicast GOOSE in PRP based Double-Ring .. 143

Figure 5-5: RBD to Assess the Depededability (a) and security (b) of the PRP Ring

LAN .............................................................................................................................. 144

Figure 5-6: RBD for SONET WAN (a) Dependability and (b) Security ...................... 144

Figure 5-7: Markov Model for SIPS Component Reliability Assessment .................... 146

Figure 5-8: 3-Bus System with Generator Rejction Scheme (GRS) ............................. 150

Figure 5-9: Fault Tree Analysis (FTA) to Assess the Probability of GRS DBM ......... 155

Figure 5-10: GRS Risk Assessment Results for Different Sensor Network Architectures

....................................................................................................................................... 155

List of Figures

Page | 8

Figure 5-11: GRS Risk Comparison with and without Intertripping (I/T) Signal ........ 156

Figure 5-12: Impact of MTTF and MTTR on Risks of Different GRS Designs .......... 158

Figure 5-13: Impact of Reliability of each GRS Phase on Overall Risks for Local GRS

....................................................................................................................................... 159

Figure 5-14: Impact of Critical Line Outage Rate on GRS Risks ................................ 160

Figure 5-15: Impact of Load Level on GRS Risks ....................................................... 160

Figure 6-1: Gone Green Transmission Generation Mix ............................................... 164

Figure 6-2: Variation in Daily Load Profile for Different Energy Scenario ................. 165

Figure 6-3: Risk Assessment Procedure using SMCS .................................................. 168

Figure 6-4: Procedures to Produce Times-Series Wind Farm Output Data .................. 170

Figure 6-5: Wind Speed Data Distribution and Wind Turbine Model ......................... 172

Figure 6-6: IEEE RTS Yearly Load Profile .................................................................. 172

Figure 6-7: IEEE 24-Bus Reliability Test System with GRS Logic ............................. 174

Figure 6-8: Comparion between System Risks with and without GRS ........................ 175

Figure 6-9: Simulated and Histroical Wind Speed Data Probability Density Function 176

Figure 6-10: Coefficient of Variation in SIPS Risk with Simulation Hours................. 176

Figure 6-11: Annual Risks Induced by Different GRS Designs ................................... 177

Figure 6-12: Comparison between a Local GRS and a System-Wide GRS ................. 179

Figure 6-13: Monthly Average Wind Speed Variation over 100 years ........................ 180

Figure 6-14: GRS Risks under Various Average Monthly Wind Levels...................... 181

Figure 7-1: Conceptual Relationship between SIPS Number and System Operational

Risks [96] ...................................................................................................................... 186

Figure 7-2: Risk Assessment Procedure Considering SIPS Interactions ...................... 187

Figure 7-3: System-level Markov Model to Assess Interaction between Two SIPS .... 188

Figure 7-4: Simplified System-level Markov Model for a System with Multiple SIPS

....................................................................................................................................... 189

Figure 7-5: Modified PJM 5-bus System with Wind Farms ......................................... 193

Figure 7-6: Impact Assessment Results under Various System Conditions: (a) Impcat of

DBM related States. (b) Impact of SBM related States. ............................................... 198

Figure 7-7: Annual Risk Induced by Different Local SIPS Designs ............................ 199

Figure 7-8: Annual Risk Induced by Different System-Wide Centralised SIPS Designs

....................................................................................................................................... 200

Figure 7-9: Variation in Risks (Arch4 voting) with Number of SIPS in System ......... 201

List of Figures

Page | 9

Figure 7-10: PJM 5-Bus System with Transmission Expansion ................................... 202

Figure 7-11: Variation in SIPS risks in a planning horizon of 25 years ....................... 203

Figure 7-12: Impact of Variation in Reliability Data on SIPS Risks ............................ 204

Figure 7-13: Adaptive SIPS using WAMPC Platform ................................................. 206

Figure 7-14: Variation in SIPS Risks under Different System Conditions ................... 207

Figure 7-15: Variations in SIPS Risks in Three Typical Days ..................................... 208

Figure 7-16: Operational Logics of Adaptive SIPS during a Day for each Scenario ... 209

Figure A-1: PSCAD Model used for Dynamic Fault based Distance Relay Testing ... 231

Figure B-1: Typical Plastic Encapsulated Transistor Package ..................................... 233

Figure B-2: Internal Structure of HMOS Microcontroller Identified as Operating above

Ambient Temperature in SHBM Module 16 ................................................................ 235

Figure B-3: Metal can packaged voltage regulator (IC11, Modules 21/23/25, SHNB 101)

....................................................................................................................................... 236

Page | 10

List of Tables

Table 2-1: SIPS Categories by Type of Corrective Actions ........................................... 35

Table 2-2: SIPS Survey Results ...................................................................................... 41

Table 2-3: SIPS Failures Recorded by NERC from 1986 to 1995 ................................. 43

Table 2-4: ISA and IEC Defined Safety Integrity Level (SIL) ....................................... 45

Table 2-5: Spurious Trip Level (STL) in terms of P(SBM) and STR ............................ 46

Table 3-1: Grace Time for Substation Automation Systems .......................................... 71

Table 3-2: Reconfiguration Time for Common Redundancy Protocols ......................... 75

Table 3-3: Substation Component Reliability Data ........................................................ 81

Table 3-4: Reliability Assessment Results for Reporting Service .................................. 84

Table 3-5: Reliability Data for Conducting Distributed Functions ................................. 88

Table 3-6: RRW of each component in Arch 1&8 ......................................................... 90

Table 4-1: UK National Grid Policy on Relay Lifetime ............................................... 104

Table 4-2: Relay Population and Anticipated Lifetime ................................................ 105

Table 4-3: Maloperations for each Relay Type from 2000-2013 ................................. 106

Table 4-4: Causes of Relay Maloperations ................................................................... 106

Table 4-5: Summary of Ambient Temperatures Recorded over a Period of One Year 108

Table 4-6: Relay Samples used for Laboratory Testing................................................ 109

Table 4-7: Fingerprint Testing Results for Static Faults ............................................... 116

Table 4-8: Fingerprint Testing Results for Dynamic Faults ......................................... 116

Table 4-9: LFCB 103 (208284J) (9 years in-service time) Testing Results ................. 118

Table 4-10: LFCB 103 (547373C) (16 years in-service time) Testing Results ............ 118

Table 4-11: Alstom P545 Differential Characteristic Testing Results ......................... 118

Table 4-12: Ratings and Assessed Overload Capabilities of Protective Relays ........... 120

Table 4-13: THR PS10 Power Supply Unit Components and Voltage Stress .............. 122

Table 4-14: Thermal Imaging of LFCB Relay and Examined Hot Components ......... 125

Table 4-15: Recommended Relay Lifetime based on Evaluation Results .................... 130

Table 5-1: Substation based Sensor Network Reliability Assessment Results ............. 149

Table 5-2: LAN and WAN Reliability Assessment Results ......................................... 149

Table 5-3: Generation Data of the 3-bus System .......................................................... 150

Table 5-4: Impact Assessment for GRS Misoperation ................................................. 154

List of Tables

Page | 11

Table 5-5: Entry Point for GRS Risk to Reach below 1$/hr ......................................... 159

Table 6-1: Risk Assessment Results for Different GRS Designs ................................. 178

Table 7-1: Probability of each Operational State in a System with Two SIPS ............. 195

Table 7-2: Variation in the Probability of Interactions between SIPS for Arch4(voting)

....................................................................................................................................... 195

Table 7-3: Impact Assessment Data of Different SIPS Operation ................................ 196

Table 7-4: System Production Cost and Wind Curtailment with Simulation Year ...... 203

Table C-1: Weekly Peak Load in Percent of Annual Peak ........................................... 237

Table C-2: Daily Load in Percent of Weekly Peak ....................................................... 237

Table C-3: Hourly Peak Load in Percent of Daily Peak ............................................... 238

Table D-1: Reliability Assessment Results for SIPS Interaction .................................. 239

Page | 12

List of Abbreviations

AHI Asset Health Index

ALE Asset Life Extension

ARMA Auto-regressive and Moving Averages

BFP Breaker Failure Protection

BPU Bay Protection Unit

CB Circuit Breaker

C-GRS Centralised Generator Rejection Scheme

CRAS Centralised Remedial Action Scheme

CT Current Transformer

DANH Doubly Attached Node running High-availability Seamless Ring

DBM Dependability-based Maloperation

DG Distributed Generation

EENS Expected Energy Not Served

EM Ethernet Media

EMI Equipment Modification Instruction

EMS Energy Management System

ESW Ethernet Switch

ETYS Electricity Ten-year Statement

EV Electric Vehicle

FACTS Flexible Alternating Current Transmission System

FMEA Failure Mode and Effect Analysis

FMMEA Failure Mode, Mechanism and Effect Analysis

FTA Fault Tree Analysis

GOOSE Generic Object Oriented Substation Event

GPS Global Positioning Satellite

GRS Generator Rejection Scheme

HMI Human Machine Interface

HSR High-availability Seamless Ring

HVDC High Voltage Direct Current

IC Integrated Circuit

ICT Information and Communication Technology


Page | 13

IED Intelligent Electronic Device

IEEE The Institute of Electrical and Electronics Engineers

IGMP Internet Group Management Protocol

LAN Local Area Network

LOWG Loss of Wind Generation

MTTF Mean Time to Failure

MTTR Mean Time to Repair

MU Merging Unit

NCC National Control Centre

NERC The North American Electric Reliability Corporation

OTS Operational Tripping Scheme

PCB Printed Circuit Board

PDF Probability Density Function

PLC Programmable Logic Controller

PMU Phasor Measurement Unit

PPI Protection Performance Information

PRP Parallel Redundancy Protocol

RAS Remedial Action Scheme

RBD Reliability Block Diagram

RoCoF Rate of Change of Frequency

RRW Risk Reduction Worth

RSTP Rapid Spanning Tree Protocol

RTS Reliability Test System

RTU Remote Terminal Unit

SAN Singly Attached Node

SAS Substation Automation System

SBM Security-based Maloperation

SCADA Supervisory Control and Data Acquisition

SCE The Southern California Edison

SCN Substation Communication Network

SDH Synchronous Digital Hierarchy

SIL Safety Integrity Level

SIPS System Integrity Protection Scheme


Page | 14

SMCS Sequential Monte Carlo Simulation

SOF System Operability Framework

SPS Special Protection Scheme

STP Spanning Tree Protocol

STR Spurious Trip Reduction

SW Switch

TMR Triple Modular Redundant

TS Time Source

UFLS Under Frequency Load Shedding

VOLL Value of Lost Load

VT Voltage Transformer

WAMS Wide Area Monitoring System

WAN Wide Area Network

WAMPAC Wide Area Monitoring Protection and Control

WECC The Western Electricity Coordinating Council

WF Wind Farm

WTG Wind Turbine Generator

Page | 15

Abstract

Reliability Assessment of a System Integrity Protection Scheme for Transmission Networks

Candidate: Nan Liu Institute: The University of Manchester

Degree: Doctor of Philosophy Date: August 2017

System Integrity Protection Schemes (SIPS) are being applied to power networks to

minimize the probability of large system disturbances and to cope with the growing size

and complexity of modern Power Systems. SIPS offer a timely and economical solution

which enhances the transmission capability whilst postponing the need for new

transmission facilities. However, recent SIPS related incidents reveal that SIPS

maloperations could contribute to the spread of the system disturbance and expose the

Power System to additional risks. In particular, the use of advanced Information and

Communication Technologies (ICT) in SIPS, along with the continuously ageing

protection assets used in the current GB National Grid, raises major concern in the

reliable operation of SIPS.

The aim of this thesis is to provide an insight into the reliability of the protection

schemes in the transmission network and develop investigation methods to

quantitatively assess the risk brought by SIPS. Probabilistic techniques have been

developed to identify the optimal SIPS design in the ICT infrastructures and operational

logic, which delivers the most reliable performance and the minimal risk to system

operation.

A method based on reliability block diagram is proposed to assess the impact of ICT

failures on the communication services in an IEC 61850 based substation automation

system. In addition, an investigation process based on function tests and invasive

examination is developed to evaluate the operational condition of the commonly used

electronic protection relay types that are approaching their predefined end of service life.

The investigation results help ensure the reliable and fast automatic protection function

against fast developing system incidents.

The risks brought by SIPS operation is studied using both analytical and stochastic

methods. A risk assessment platform based on Sequential Monte Carlo Simulation

(SMCS) is developed to capture the time-series feature of the system conditions and

assess the variation in SIPS operational risk. This thesis also describes a generic

framework of using multi-level Markov models to quantify the probability of

undesirable interactions between SIPS on the same or neighbouring systems. The

simulations results indicate that, with a widespread proliferation of SIPS, uncoordinated

SIPS operations lead to severe impact on Power System reliability. The use of adaptive

SIPS, which adjust its protection logics to the increasingly variable system condition,

could effectively mitigate the operational risk.

Page | 16

Declaration

No portion of the work referred to in the thesis has been submitted in support of an

application for another degree or qualification of this or any other university or other

institute of learning.

Page | 17

Copyright Statement

i. The author of this thesis (including any appendices and/or schedules to this

thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he

has given The University of Manchester certain rights to use such Copyright,

including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or

electronic copy, may be made only in accordance with the Copyright, Designs

and Patents Act 1988 (as amended) and regulations issued under it or, where

appropriate, in accordance Presentation of Theses Policy You are required to

submit your thesis electronically Page 11 of 25 with licensing agreements which

the University has from time to time. This page must form part of any such

copies made. iii. The ownership of certain Copyright, patents, designs, trademarks and other

intellectual property (the “Intellectual Property”) and any reproductions of

copyright works in the thesis, for example graphs and tables (“Reproductions”),

which may be described in this thesis, may not be owned by the author and may

be owned by third parties. Such Intellectual Property and Reproductions cannot

and must not be made available for use without the prior written permission of

the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and

commercialisation of this thesis, the Copyright and any Intellectual Property

and/or Reproductions described in it may take place is available in the

University IP Policy (see

http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=2442 0), in any

relevant Thesis restriction declarations deposited in the University Library, The

University Library’s regulations (see

http://www.library.manchester.ac.uk/about/regulations/) and in The University’s

policy on Presentation of Theses

Page | 18

Acknowledgement

First and foremost, I would like to express my gratitude to my supervisor, Prof. Peter

Crossley, for his altruistic supervision, invaluable guidance and continuous support

throughout my PhD research. I really appreciate his helpful comments and discussions

which have contributed a lot to this achievement.

I would like to acknowledge School of Electrical and Electronic Engineering, the

University of Manchester for providing the financial support during my PhD studies.

I would also like to thank my colleagues and friends in the Ferranti Building. Thanks to

Dr Mathaios Panteli and Dr Zhihui Dai for their technical advice and cooperation in

publishing joint papers. I would like to acknowledge Dr Bryan Gwyn, Dr Eric Udren

and Dr Solveig Ward from Quanta Technology, Dr Pearl Agyakwa and Dr Martin

Corfield from the University of Nottingham. The project would not have been

successful without their invaluable advice. Thanks to my friend Yipeng Wang, for her

understanding and support during the stressful moments throughout my PhD.

Finally, I would like to express my deepest gratitude to my family, especially my

parents and grandparents. Thank you for your selfless support, tolerance, trust and

unconditional love throughout my life. I hope that you are happy for this achievement,

because every moment in this journey you are always standing by my side.

Page | 19

Publications

1. N. Liu and P. Crossley, "Assessing the Risk of Implementing System Integrity

Protection Schemes in a Power System with Significant Wind Integration," IEEE

Transactions on Power Delivery, Volume: PP, Issue: 99, 2017.

2. Z. Dai, P. Crossley, N. Liu and X. Liu, “Probabilistic Identification Method of

Distance Protection Misoperation due to Power Flow Transfer,” Int. Trans. on Electr.

Energ. Syst., Volume 27, Issue 3, March, 2017. (Journal paper)

3. N. Liu and P. Crossley, "Risk assessment of a generator rejection scheme

implemented in a wind farm," in 2016 IEEE Power and Energy Society General

Meeting (PESGM), 2016, pp. 1-5. (Conference paper, oral and poster)

4. N. Liu, M. Panteli, and P. A. Crossley, "Risk assessment of an IEC 61850 based

substation communication network in a system integrity protection scheme," in IET

International Conference on Resilience of Transmission and Distribution Networks

(RTDN) 2015, 2015, pp. 1-6. (Conference paper and oral)

5. N. Liu, M. Panteli, P. A. Crossley. “Reliability Evaluation of an All-digital System

Integrity Protection Scheme”, in 2015 PAC World Conference, Glasgow, Scotland,

29 Jun – 02 Jul, 2015. (Conference paper, oral and poster)

6. N. Liu, M. Panteli, and P. A. Crossley, "Reliability evaluation of a substation

automation system communication network based on IEC 61850," in 12th IET

International Conference on Developments in Power System Protection (DPSP

2014), 2014, pp. 1-6. (Conference paper and oral)

7. N. Liu, X. Wang, P. A. Crossley. “Impact of Harmonics on Overcurrent Protection

Relays”, in the 5th International Conference on Advanced Power System Automation

and Protection (APAP), Jeju, South Korea, 28-31 October 2013. (Conference paper

and oral)

8. M. Kuflom, P. A. Crossley, and N. Liu, "Impact of pecking faults on the operating

times of numerical and electromechanical over-current relays," in 13th International

Conference on Development in Power System Protection 2016 (DPSP), 2016, pp. 1-6.

(Conference paper and oral)

http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=4359248

Page | 20

CHAPTER 1

INTRODUCTION

1.1. Power System Reliability

An electrical power network is designed for the transmission and distribution of

electricity and is required to provide an uninterrupted and high quality “main” supply to

all residential, commercial and industrial customers. Criteria in system design, planning

and operation, that incorporate existing and new technologies, have been developed

over decades to enhance the reliable and economic operation of all Power Systems, but

especially those in the developed world. Due to the continuous changes in loads, types

of generation and other key operating parameters (e.g. system inertia and rate of change

of frequency (RoCoF), etc.), the operation and protection of many Power Systems are

becoming extremely difficult and complex.

Power Systems consist of a large number of components and infrastructures spread over

a wide geographical area, failures in any part of the system could cause interruptions

affecting an area involving a small number of residents up to a national network,

resulting in widespread disruption of supply, in extreme cases the failure might black-

out a region, country or union of countries, such as the EU. Historically, catastrophic

failures in power system have occurred throughout the years [1]. In recent years, the

frequency of these events continues to increase, partly due to the complex environment

brought by the deregulation of the power industry. In addition, the economic penalties

Chapter 1: Introduction

Page | 21

associated with such events have become ever more severe as modern society becomes

increasingly more reliant on the availability of a high-quality power supply. Power

Systems have evolved with continuing growth in demand, significant deployment of

renewables and increasingly use of interconnected networks and all these factors have

brought additional stress to the electrical network, resulting in lines and other electrical

components being operated more frequently closer to their operating limits.

Transmission system operators must continuously deal with the challenges to reduce the

intensity and severity of system disturbances and maintain Power System reliability.

Reliability is defined by Bazovsky as “the probability of a device performing its

purpose adequately for the period of time intended under the operating conditions

encountered” [2]. When applied to Power System, it is defined as: -

“(System) Reliability is a general term encompassing all the measures of the

ability of the system to deliver electrical energy to all points of utilization

within acceptable standards and in the amounts desired [3].”

Reliability of Power System can be evaluated by considering two functional aspects [4]:

Adequacy: is the existence of sufficient facilities within the system to satisfy the

consumer demand. It includes sufficiency in the generated energy and the associated

transmission and distribution networks required to transport the energy to the

customers in the long term. Adequacy is evaluated under the static conditions of the

Power System without considering system disturbances.

Security: is related to the ability of the system to respond to disturbances arising

within the system. It is therefore associated with the response of the system to its

subjected disturbances.

The necessity of maintaining the reliability of Power System has always been

recognized by Power System managers, designers, planners, and operators. Redundancy

in generation, transmission and distribution facilities are built in to ensure the adequacy

and continuity of the power supply, especially in the event of failures and system

outages. In addition, criteria in system operation and planning have been developed to

ensure reliable overall system capabilities. According to [5], the criteria and techniques

first used in practise were all determinatively based and were required to fulfil the

following aspects:


Page | 22

a) Planning Generating Capacity: The installed generating capacity must equal the

maximum demand plus a certain percentage of reserve. Historically, the period with

the highest load was used to assess the adequacy of the generating capacity. However,

with more intermittent renewable generation in the system and less conventional

fossil fuel generation, critical situations may occur when the generation output from

the renewables is low. Consequently, the variation in renewables needs to be

considered when assessing the sufficiency in generating capacity.

b) Operating Capacity: Accurate control of system voltage and frequency is an

important aspect of stable and secure system operation. Individual generators are

scheduled and dispatched to satisfy the constant changing load demand, and keep the

frequency within acceptable limits. Reserves in spinning capacities are required to

cope with the loss of major generator units. In addition, system voltages need to be

maintained within a secure range and this is achieved by adjusting reactive power

sources, such as generators and capacitor banks. Automatic voltage regulators are

built into the generators to control the voltage to scheduled levels.

c) Planning Network Capacity: The power flow on transmission and distribution lines,

transformers and other current-carrying devices needs to be monitored to ensure the

thermal limits are not exceeded. In addition, the system needs to be operated reliably

during a system contingency, which might be the loss of a key generator or a

transmission line. This is known as the “N-1 Criterion”. The criterion specifies that

the Power System shall continue to operate in its normal operational state following

the loss of one generation unit, transmission line or transformer. Accordingly,

network defensive strategies need to be developed based on the assumption that the

equipment can and will fail unexpectedly. Both short-term planning (e.g. day-ahead

and week-ahead) and long-term planning are required to provide adequate generation

and transmission capacities to prevent widespread and uncontrolled cascading

outages during severe contingencies.

Although reliability criteria have been developed, considering randomly occurring

failures in a Power System, most of them are inherently deterministic and cannot reflect

the stochastic nature of system behaviour, customer demand and component failures.

For example, deterministic analysis can consider the impact of hazards leading to a

dangerous state or system failure. However, a hazard, even if extremely undesirable,

could be of little consequence if it is very unlikely to occur. Therefore, planning based


Page | 23

on such hazard analysis will lead to overinvestment [6]. Consequently, probabilistic

reliability assessment methods are required to combine both severity and likelihood of

the event to reflect its true system risk. The objective of reliability evaluation is further

explained as:-

“to indicate how a system may fail, the consequences of failures and also to

provide information to enable engineers and managers to relate the quality of

their system to economics and capital investments. In so doing it can lead to

better and more economic designs, and a much improved knowledge of the

operation and behaviour of a system [5].”

A wide range of probabilistic techniques have been developed to assess the reliability of

a Power System. The general industry practice for reliability assessment can be

performed using two main approaches: analytical and simulation [7]. The analytical

approach represents the system using a mathematical model and analyses its

performance under a set of “normal contingencies” selected on the basis of the

likelihood of occurrence. System behaviours are evaluated by assessing the reliability

indices from the analytical model using mathematical solutions. The analytical approach

has been widely applied in industry to help ensure reliable system operation with

relatively low computing effort. However, it is difficult to incorporate the various

operational states of a complex system or complex operating logics of emergency

control into the analytical method. The conditional probabilities the various operational

states of a complex system will be extremely difficult to estimate. Additionally, this

method has limitation in dynamically assessing the impact of emergency control actions

on the system.

The simulation approach estimates the reliability of the system by simulating the

stochastic nature of the system conditions and events, and uses this to quantify the risk.

This method can theoretically map all the contingencies and failures inherent in the

planning, design and operation process into the reliability model. These include random

system events such as outages and repairs of Power System elements, dependent events,

component behaviours, load and generation variations as well as operating policies.


Page | 24

1.2. Project Motivation and Objectives

During the past few decades, considerable progress has been made in Power System

reliability modelling, and quantitative analysis based on probabilistic theory has been

applied to power system reliability assessment. Reliability of protection systems has

emerged as one of the most important aspects of Power System reliability due to its

impact on system operation. Researchers have been working: - to identify the impact of

protection failures on a Power System; to incorporate protection failures into the overall

system analysis and to enhance the Power System reliability evaluation process.

Conventional protection was designed to disconnect the faulty or overloaded elements,

whilst leaving the rest of network in operation. Several recent system wide disturbances

indicated the local protection, which was designed to arrest a local system problem, is

limited in its ability to arrest a wide area disturbance. In addition, new protection

solutions are required due to the significant changes in the landscape of Power Systems

[8]. With an increasingly open energy market, Power Systems continue to expand,

integrating more renewable energy, distributed generation and independent power

producers. Regulatory pressure has ensured the Power System operators’ attention is

specifically focused on growing their return on asset investment, and this must be

achieved whilst energy consumption is continuously increasing and many of Power

System infrastructures are ageing. This means existing transmission networks are

expected to operate closer to their operating limits.

Automated protection schemes are designed to detect system abnormal conditions,

typically contingency-related, and then take predetermined corrective actions to

preserve system integrity and provide acceptable system performances. These schemes

are called System Integrity Protection Scheme (SIPS). SIPS are also known as

Operational Tripping Schemes (OTS), Special Protection Schemes (SPS) or Remedial

Action Schemes (RAS). It adds another dimension to the conventional Power System

protection, which has limitation in arresting wide area disturbance. SIPS are being used

by utilities to minimize the probability and consequence of large disturbances and to

cope with the growing size and complexity of modern Power Systems. In addition, with

the integration of renewables (e.g. wind generation) also generally means the sources

are remote from the load and the existing transmission lines are normally expected to

operate closer to their operating limits. SIPS offer a timely and economic solution to


Page | 25

enhance the capability of existing transmission network and postpone the need for new

transmission facilities. Remedial actions of SIPS include changes in load (e.g. load

shedding), generation or changes in system configuration to maintain system stability,

acceptable voltages or power flows.

Recent surveys have witnessed a significant proliferation of the SIPS installed to

enhance the transmission capability of the electrical network and accommodate more

renewable generations. The intense application of SIPS also leads to a massive

expansion of the information and communication technology (ICT) infrastructure. The

ICT used in a Power System allows the system operator to respond to the danger caused

by abnormal system conditions in a more effective and timely manner, in this case,

preventing the propagation of system disturbances. However, the increasing penetration

of advanced ICT brings significant changes in instrumentation, monitoring,

communication and control in the Power System protection area, which raises major

concern about its reliability.

Several system disturbances demonstrated that solutions increasingly reliant on SIPS to

preserve system integrity expose the system to additional risks. The system disturbance

report issued by the North American Electric Reliability Corporation (NERC) [9], refers

to 25 SIPS related maloperations over the period 2000 to 2009, the majority were

caused by hardware or software failure, faulty design logic or human error. Additionally,

a high penetration of SIPS increases the complexity of system operation. This may lead

to a higher probability of undesired interactions between the SIPS on the same or

neighbouring systems. The Irish incident on 5th August 2005 and the Nordic event on 1st

December 2005 highlighted the catastrophic impact of unintended SIPS interaction on

system reliability [10, 11].

SIPS maloperations normally lead to severe and costly consequences (e.g. customer

disconnection) due to their critical role in maintaining system integrity. Consequently, it

is vitally important to understand the failure mechanism of SIPS and ensure the

performance fulfil the strict reliability requirements. Most of the existing SIPS

reliability standards are deterministic and have limitations in quantitatively assessing

the risk induced by SIPS. Probabilistic techniques are therefore required to include the

impact of SIPS maloperations in the reliability assessment model.


Page | 26

Another potential risk on the reliability of the protection system comes from the

continuously ageing protection assets used in the current GB National Grid. A large

number of exiting protection assets were commissioned in the 1980s, which means they

are approaching their designed end of service life. Consequently, an investigation

process to accurately predict the reliable service lifetime for these devices is vitally

important.

As the owner of transmission network in England and Wales and the transmission

network operator across Great Britain, the National Grid has the obligation to ensure the

reliable delivery of electrical power without excessive cost. Motivated by these potential

risks affecting the reliability of the transmission network, the aim of this research is to

provide an insight into the reliability of the protection schemes and assets now used on

the GB transmission network, and develop investigation methods to quantitatively

assess the risks brought to system operation.

Most of the risk assessment methods developed in the previous literatures focused on

using analytical reliability assessment method to determine the optimal SIPS

operational logic and arming strategies. However, Power System operational conditions

will become more unpredictable due to the intermittent nature of renewables and

demand-side participation. A more dynamic stochastic based SIPS risk assessment

method is required to accurately assess the impact of fast changing system conditions on

the SIPS operational risks. With the increasing penetration of advanced ICT, changes in

the instrumentation, monitoring, communication and control in the protection system

need to be considered in the reliability assessment. Additionally, a widespread

proliferation of SIPS is expected during the next decade, partly following the greater

use of renewable generation. The current SIPS reliability assessment methods mainly

focused on assessing the reliability of single SIPS and had limited application in

assessing the reliability of a system with multiple SIPS. Therefore, a method that can be

used to effectively incorporate SIPS interactions and their impact on system operation

needs to be developed.

The main objectives of this research are as follows:

To review major SIPS maloperations and investigate their failure modes and impact

on the propagation of system disturbances.


Page | 27

To undertake a literature review on the lifetime analysis of the protection assets.

Furthermore, to determine deterioration mechanisms and identify the life-limiting

elements within the electronic protection assets currently used in the UK National

Grid transmission network.

To investigate the impact of component failures, communication architectures and

maintenance strategies on the communication services used in a Substation

Automation System (SAS).

To develop investigation methods to quantitatively assess the risks brought by SIPS

on Power System operation.

To evaluate the impact of increased deployment of renewable energy on the

operational performance and long-term development of SIPS applications, and

effectively manage the additional operational risk caused by SIPS proliferation.

1.3. Contributions

Driven by the perception that today’s Power System is subjected to the additional risks

brought by protection maloperations, it is necessary to incorporate the impact of system

protection in the reliability assessment models and quantitatively assess the risk brought

to system operation. This project advances existing techniques in reliability assessment

of wide-area protection schemes in electric Power System. In addition, investigation

processes are developed to address risks induced by both modern ICT and the ageing

protection infrastructure commonly used in Power Systems. The main contributions of

this project can be summarized as follows:

Identification of the impact of modern ICT on Power System Protection

An overview of the application of the advanced ICT in Power System protection is

reviewed. Changes brought to the instrumentation, monitoring, communication and

control systems in SIPS are demonstrated. In particular, a method to evaluate the

reliability of the communication services used in an IEC 61850 based digital

substation automation system is developed.

Review of National Grid Protection Performance Information (PPI) Reports for

the examination of the historical records of the main protection types in UK

transmission network


Page | 28

The operational performance for three most commonly used electronic relay types in

the UK National Grid transmission network are reviewed to identify any recorded

relay maloperations. In addition, the population, age profile, maloperation causes,

failure and repair history, and reports of benchmark experience from other utilities

are studied. In particular, the National Grid PPI reports from 2000-2013 are reviewed

to identify and study relay maloperations attributed to hardware failures.

Development of an end-of-life investigation process for protection asset life

extension evaluation

An investigation process based on functional testing and invasive examination is

proposed to determine the deterioration mechanisms of electronic protection devices.

This evaluation process has been successfully applied to the three most commonly

used electronic relays in National Grid transmission network (i.e. SHNB, THR and

LFCB) and has effectively validated asset life extension decisions of five years for

each relay type.

Proposal of risk-based reliability assessment methodologies for System Integrity

Protection Schemes

The failure modes of the main SIPS components and their impact on overall SIPS

operation are determined using Failure Mode and Effect Analysis. The risks induced

during SIPS operation come from three main sources: successful SIPS operation,

dependability-based maloperation and security-based maloperation. Two different

approaches are proposed to quantify the risk of each SIPS operational state: An

analytical method based on Markov Modelling and Reliability Block Diagram and a

stochastic method based on Sequential Monte Carlo Simulation. The two methods

are then applied to the IEEE 24 bus Reliability Test System with GRS logic.

Assessment of risk caused by unintended SIPS maloperations

As SIPS are normally perceived as a cost-effective alternative to transmission

network expansion to enhance system capability, a widespread proliferation of SIPS

is expected during the next decade, partly following the greater use of renewable

generation. High penetration of SIPS significantly increases the complexity of

system operation and also leads to a higher probability of unintended interactions


Page | 29

between SIPS on the same or neighbouring systems. A method based on multi-level

Markov Modelling and Sequential Monte Carlo Simulation is proposed to effectively

incorporate all the possible SIPS states. This method helps estimate the probability of

SIPS interactions and their impact on system reliability. It indicates that unintended

interactions between SIPS could result in cascading failures and lead to a more

severe impact, as compared with individual SIPS failure.

Management of the risk associated with SIPS in Power System long-term planning

The operating risk of SIPS, especially risks caused by SIPS interaction, would

increase significantly with greater wind integration and a higher penetration of SIPS.

The impact of long-term system planning on SIPS risk is considered by incorporating

transmission upgrading, demand increase and wind integration in the risk assessment

model. Different approaches are proposed to manage the continuously increasing

SIPS operational risk. The impact of transmission expansion on alleviating system

congestion as well as SIPS risks is studied. In addition, a new type of SIPS with

adaptive protection logic, which adjusts to the increasingly variable system

conditions, is designed to manage its operational risk and achieve better cooperation

with other protection schemes.

1.4. Outline of the Thesis

The thesis consists of eight chapters and is organized as follows:

Chapter 1 discusses the motivation, objectives and contributions of this research.

Chapter 2 provides an introduction of the application, classification, design

considerations and reliability requirements of a System Integrity Protection Scheme.

This is followed by a review of major system disturbances caused by SIPS

maloperations. The aim is to identify the main causes of SIPS maloperation and

investigate its impact on system operation and reliability. In addition, the sensing,

communication and control technologies used in SIPS, and the associated scheme

topologies and reliability enhancement methods are illustrated by reviewing existing

SIPS applications.


Page | 30

Chapter 3 investigates the impact brought by advanced ICT on Power System

protection. An overview of the SIPS communication infrastructure, together with its

reliability considerations is first discussed. In particular, the impact of the IEC 61850

communication protocol on the monitoring, communication, control and protection in

the substation automation system is investigated. This is followed by the reliability

assessment of the different communication services used in IEC 61850 based digital

substations.

Chapter 4 focuses on reliability assessment of electronic based protection equipment in

UK National Grid network. The protection assets’ population, ages, current placement

plan and historical performance record are first reviewed. An investigation process is

next developed to validate or forecast the reliable service life of a particular protection

type. The operational behaviours of the studied relays are compared with modern

numerical relays to check whether these contemporary replacement relay types could

offer a meaningful performance improvement that could influence the replacement

decision. Potential life-limiting conditions are identified by examining the operational

conditions of components on energized relay modules. Finally, management and

maintenance strategies are recommended to National Grid to ensure reliable service of

these protection assets.

Chapter 5 demonstrates using an analytical method to quantitatively assess the risk

brought by SIPS. Literature reviews on previous developed methodologies to assess

SIPS reliability are provided. The developed evaluation procedures include estimating

the probability of SIPS maloperation, evaluating the consequences of different

operational states on system reliability and quantifying the SIPS operational risk in

financial losses. Sensitivity analysis is used to identify the impact of uncertainties on the

SIPS operational risks.

Chapter 6 introduces a modified SIPS risk assessment procedure based on a stochastic

Sequential Monte Carlo Simulation to effectively reflect the time-series changes in

system condition. The future trend in energy scenarios and generation mix in the UK

Power System is estimated. An auto-regressive and moving averages model is

developed to forecast future wind speed based on historical data and is mapped into the

Sequential Monte Carlo Simulation process to reflect the intermittent nature of wind

generation. The method is then applied on the IEEE 24-bus Reliability Test System with


Page | 31

significant wind integration to check its impact on the operational risk of a Generator

Rejection Scheme (GRS).

Chapter 7 considers SIPS within the context of Power System long-term planning. The

widespread proliferation of SIPS introduces additional risks to SIPS operation due to a

higher probability of unintended SIPS interaction. The challenges in SIPS coordination

following increased operational complexity and SIPS penetration are first discussed.

Next, a risk assessment procedure, which was based on the method in Chapter 6, is

proposed to quantitatively assess the risk from possible SIPS interaction scenarios.

Finally, reliability enhancement methods such as transmission expansion and the use of

‘adaptive’ SIPS are introduced to manage the risk of SIPS implementation.

Chapter 8 presents the main conclusions from this research. The findings and

contributions are summarized. Finally, suggestions for future work on SIPS reliability

assessment are discussed.

Page | 32

CHAPTER 2

RELIABILITY OF SYSTEM INTEGRITY

PROTECTION SCHEME

2.1. Introduction of System Integrity Protection Scheme

A reliable electrical power supply is an important requirement for all consumers and

especially these working and living in advanced urban economies. Power Systems are

the most critical infrastructures built by man. Therefore, the primary emphasis has

always been to provide uninterrupted, high quality supply to residential, commercial

and industrial customers. Spare and redundant facilities for both the generation and

transportation of electrical energy have been built to ensure the continuity of supply in

case of equipment and human failures and especially transmission line outages.

Nowadays, system-wide disturbances are becoming a more important issue in the Power

System, as illustrated by several recent blackouts [12, 13]. During a major system

disturbance, protection and control systems are required to limit or stop system

degradation, restore the system to a normal state and minimize the impact of the

disturbance. However, traditional protection systems arrest local system problems or

protect a single item of plant. Therefore, such systems have limited communication with

other parts of the system and are not intended to arrest wide area disturbances. The

Chapter 2: Reliability of System Integrity Protection Scheme

Page | 33

impact of system-wide disturbances and the prevention of blackouts require the

protection system to be integrated with modern technologies and designed to preserve

system integrity under severe system conditions. In addition, significant changes in the

landscape of Power Systems, and especially those including significant renewable and

intermittent resources also require the use of new protection solutions. For example, in

the past, electrical power was often generated by coal stations located close to the

demand centres. Whilst in the future, coal stations will be decommissioned and power

will be generated by wind and nuclear, shifting the location of generation away from

load. Meanwhile, the trend in Power System planning is leading to a system with tight

operational margins and less transmission redundancy. Hence a transmission network

becomes more essential and may be expected to operate frequently close to their

operating limits.

All these fundamental changes in the design, operation and planning of the electric

Power System encourage the use of a system-wide protection solution, integrated into

the Power System and designed to minimize the probability of large disturbances and

cope with the growing size and complexity of modern Power Systems. Consequently,

automated protection schemes are designed to detect system abnormal conditions,

typically contingency-related, and then take predetermined corrective actions to

preserve system integrity and provide acceptable system performances. These schemes

are called System Integrity Protection Scheme (SIPS). SIPS are also known as

Operational Tripping Schemes (OTS), Special Protection Schemes (SPS) or Remedial

Action Schemes (RAS). They are commonly used by utilities as a timely and economic

solution to enhance system security and postpone the need for new transmission

facilities. Remedial actions of SIPS include changes in load (e.g. load shedding),

generation or changes in system configuration to maintain system stability, acceptable

voltages or power flows.

A general SIPS operation consists of three main steps: input, decision making and

control application. Examples for each SIPS operational phase are listed in Figure 2-1.

The input of SIPS is normally the electric variables measured at various locations or the

direct detection of an event such as the open/closed status of circuit breakers, etc. Inputs

from the power system are then sent to the local logic processor or control centre, where

stability analysis is performed and a decision made on whether SIPS operation is


Page | 34

required and how this will maintain system security. Once a decision to take action is

made, the control command will be communicated to the mitigation devices in the field

to execute the corrective action.

Disturbance

Power System

Input

System Integrity Protection Scheme

Decision Making Control Application

Power FlowVoltageROCOFGen&load Monitoring

Arming CalculationLogic ControlMitigation level Calculation

Gen TrippingLoad Shedding

e.g. e.g. e.g.

Figure 2-1: General Structure of System Integrity Protection Scheme

2.1.1. SIPS Applications

SIPS aim to trigger corrective actions when detecting abnormal system conditions.

These actions help maintain the integrity of the Power System against the following

issues:

System congestion

Small-disturbance angle instability

Transient instability

Frequency instability

Voltage instability

Thermal overloading

Etc.

SIPS are designed to mitigate these critical contingencies which may then initiate wide

area system problems. Various remedial actions can be applied to improve system

performance. The selection of the control action is based on the power system topology

and its tolerance to the risks brought by the control actions. For example, generator

tripping is an effective method to balance the generation with load and to preserve the

transient, frequency and voltage stability. However, it may also reduce the system


Page | 35

inertia, damage the generator drive shaft and cause thermal stress within the generator

which may be unacceptable for some systems conditions.

In general, different control strategies are applied to address various disturbance

propagations. The categories and percentages for each type of SIPS action is tabulated

according to the most recent survey on SIPS conducted by IEEE-CIGRE in 2010 [14]. It

can be seen that the load rejection (10%) and under frequency load shedding (UFLS)

were the most commonly used SIPS at that time. Load rejection is a protection scheme

designed to operate following a system event which causes supply-load unbalances that

may eventually lead to wide area system disturbance. It ensures a system or subsystem

in parallel with the remaining parts of the system in case of loss of major supply. The

load rejection SIPS differs from the automated under-frequency load shedding. It is

designed to separate the system before the change of frequency can trigger the operation

of the under-frequency relays. With the rapid growth in wind generation, the generator

rejection schemes (GRS) are becoming more frequently used to help alleviate system

congestion and allow greater access to lower-cost power.

Table 2-1: SIPS Categories by Type of Corrective Actions

Load Shedding Generation Control

- Slow speed System Stability

Load rejection (10%) Generator rejection (8%) Out-of-step tripping (7%)

Under frequency load

shedding (8%)

Power System stabilizer control

(3%)

Voltage instability advance

warning (2%)

Under voltage load

shedding (6%)

Discrete excitation (1%) Angular stability advance

warning (1%)

Adaptive load mitigation

(2%)

Generator runback (3%) System separation (7%)

Overload mitigation (7%) AGC actions (4%) Dynamic braking (1%)

Controls - Slow speed Controls – high speed reactive

voltage compensation Congestion Mitigation

Tap-changer control (2%) Bypassing series capacitor (2%) Congestion mitigation (3%)

Turbine valve control (1%) Shunt capacitor switching (5%) Load and generation

balancing (3%)

Black-start or gas turbine

start-up (1%)

SVC/STATCOM control (4%) Busbar splitting (2%)

HVDC controls (3%)


Page | 36

2.1.2. SIPS Classification

SIPS are installed to preserve stability or integrity of the overall Power System or its

strategic portions. Therefore, the application of SIPS may require multiple monitoring

and implementation devices allocated all over the system and the utilization of

communication facilities. SIPS can be classified by many factors, including architecture,

input variables and operating times, etc.

1) SIPS Architectures

SIPS can be classified in terms of their architectures. For example, based on the

physical location of the sensing, decision making, and control devices and the impact of

the scheme on the Power System, SIPS can be classified into the following categories:

a) Flat Architecture: For this type of SIPS, all the measurement, decision-making and

control devices of the SIPS are typically located in one location. The decision

making and the initiation of the corrective action may also require remote

information collected by the communication facility. Operation of this type of SIPS

normally has dedicated function and only affects a portion of the system. An

example of flat architecture is the under frequency load shedding (UFLS) scheme.

The UFLS relays are normally distributed at different locations in the network and

operate to trip preselected circuit breakers to disconnect small sections of distributed

network and their associated loads when frequency drops below pre-set values.

b) Hierarchical Architecture: SIPS with hierarchical architecture involve multiple

steps in its control actions. This type of SIPS requires communication between

substations to transfer the local measurements and predetermined parameters to

multiple control locations and is able to conduct its decision based on a system-wide

view. Operating logics involves the use of operating nomograms. State estimation

and contingency analysis can also be integrated into the decision making process.

Consequently, the coordination between different protection actions throughout a

wide area network can be achieved via multi-level corrective actions. For example, a

system separation scheme is normally a hierarchical architecture which involves

monitoring of multiple interconnecting circuits, sending trip signals to circuits at

different locations and altering the power flow on other interconnectors.


Page | 37

The main difference between the two SIPS architectures is the necessity for the control

coordination to take a higher and wider system view to implement the protection actions.

A flat architecture involves a single layer of decisions and actions whereas the

hierarchical scheme may involve multilayers of decision making and control actions and

requires communication between substations. The application of the two architectures is

also dependent on the system condition and the required protection speed. Immediate

control actions initiated by the local schemes are sufficient in some small systems.

However, in a large and highly interconnected system, control coordination with state

estimation operating over a wide-area system may be required to prevent the

propagation of the disturbance.

2) Centralised and Distributed Schemes

Another classification is to separate SIPS into centralised SIPS and distributed SIPS

based on the location of its controller and corrective devices. In a distributed scheme,

the controllers installed at different locations in a system are used to implement the

corrective actions. The distributed intelligent electronic devices (IED) can process the

local information based on local requirements. The system protection function can be

realized by integrating and coordinating the distributed controllers which provides the

corrective actions. For a centralised scheme, the wide area monitoring system (WAMS)

gathers all the information required from local and remote station to one location, where

the decision-making process is implemented. The centralisation of the distributed

information can be realized as part of the energy management system (EMS), using a

centralised programmable logic controller (PLC) or remote terminal unit (RTU).

3) Input variables

According to the input variables and decision making process as described in [15], SIPS

can also be classified as follows:

a) Event-based: In an event-based scheme, electrical outages are directly detected and

initiate the corresponding emergency action such as generation rejection and load

tripping.

b) Parameter-based: Parameter-based schemes are initiated by significant changes in

the measured variables.


Page | 38

c) Response-based: In response-based schemes, system response during emergencies

is monitored and a close loop is incorporated in the decision making process to

determine the best response to the system situation.

d) Combination of the above: In practice, most of the schemes are combinations of

above types of schemes. For example, some schemes are triggered by a combination

of events and parameters.

2.2. SIPS Design Consideration

As shown in Figure 2-2, SIPS design process can be broken down into five steps [16]:

1) System Study: Accurate system study needs to be completed to identify all the

contingency scenarios and determine the parameters required by the control centre

from the monitoring system of SIPS. In particular, the thermal, voltage or regular

instability related system limitations or restrictions under various system

contingencies are evaluated. The arming criteria and reliability levels also need to be

determined.

2) Solution Development: The minimum actions required for each type of system

contingency is determined based on a system study. The corrective actions for

different SIPS applications can be found in Table 2-1. For example, the amount of

load shedding, generator rejection, stability limits and voltage limits for the different

SIPS implementation is determined in this stage.

3) Design and Implementation: In the implementation stage, practical issues needs to

be addressed as listed in [16]. Questions regarding the technology/functional

requirement, cost effectiveness, maintenance plan, complexity, redundancy, logic

development and physical architecture of the implementation need to be discussed.

4) Commissioning & Periodic Testing: Successful implementation solution of SIPS

relies on a proper testing plan which should include lab testing, field testing, study

validation and periodic testing.

5) Training & Documentation: SIPS failures caused by faulty logic design and human

errors occupy 42% of total failures based on a previous survey [17]. Proper training

of operating and maintenance staff helps reduce human errors and ensure reliable

operation. Complete documentation about SIPS functionalities and their operational

record will improve the efficiency in staff training.


Page | 39

System StudySolution

DevelopmentDesign &

Implementation

Commissioning & Periodic

Testing

Training & Documentation

SIPS Design Process

Figure 2-2: SIPS Design Process

The operation requirement of SIPS is derived from system planning studies which

identifies the performance criteria following system contingencies. Among the most

important features identified from the system study are [18]:

2.2.1. Initiating Conditions

The critical system contingencies to initiate SIPS operation if the scheme is armed are

identified as SIPS initiating conditions. These may require local or wide-area devices to

measure the following parameters:

Voltages and/or currents

Frequency

Control signals, e.g. automatic voltage regulator, Power System stabilizer,

generator governors, reactive power compensation including HVDC converters and

FACTS, etc.

Status including circuit breaker position, tap-changer position, and disturbance

recorder start signals, etc.

Arming: levels, thresholds, automatic/dynamic, and manual.

The arming criteria determine the system conditions for which SIPS are switched into

the standby mode and are ready to take control actions. SIPS are normally designed to

monitor the load level, generation level, voltage, frequency, breaker status and other

quantities which help identify the emergence of Power System problems. The

information required at different locations can be collected using SCADA and EMS

computer and then processed by programmable logic controllers, microprocessor-based

relays and other IEDs. The arming process can be done either automatically or manually.

2.2.2. Time Requirements

The maximum allowable time for the remedial action to be accomplished need to be

determined. Stability problems typically have the fastest “action” requirements, they


Page | 40

maybe as fast as a few cycles, but usually require operation in less than one second.

Voltage collapse problems may allow a response to be delayed for several seconds,

whilst actions to mitigate thermal overloading could occur after several minutes.

SIPS may exist in a stand-alone mode to provide fast actions using local data, or it may

use system wide data for decision making. The latter may require longer operating times

due to the communication of data between the measuring devices and the control centre.

Examples of IED based SIPS include detecting changes in system topology and

detecting loss of synchronism. EMS based SIPS take a more ‘static’ view of the Power

System, and generally use the communication interface of the SCADA/EMS function.

Actions such as optimal power flow, emergency load control can be used by this type of

SIPS.

Communication Requirements

Loca

lC

en

tral

ise

d

Milliseconds Seconds Minutes

IED System Integrity Protection Scheme

Wide-area System Integrity Protection

Scheme

Energy management System Integrity

Protection Scheme

Figure 2-3: System Integrity Protection Scheme Typical Operating Times

2.2.3. Redundancy Consideration

Similar with conventional protection, redundancy in SIPS design has to be considered.

This ensures the removal of one scheme component following a failure, or perhaps

maintenance, will not affect the normal operation of the scheme. Redundancy

requirements cover each aspect of a SIPS design including: detecting, arming, power

supply, communication IEDs and logic controllers. Redundant components need to be

provided in SIPS design. In addition, since the communication system is the backbone

of a SIPS application, the reliability of the overall communication path becomes critical


Page | 41

in SIPS operation. Therefore, a normal SIPS operation, after losing a single

communication path, needs to be ensured.

The introduction of redundancy in a SIPS system will lead to an increased probability of

unwanted SIPS operation (i.e. SBM). Similar to failure to operate, undesired SIPS

operation will also have an adverse impact on the Power System. Therefore, a voting or

vetoing scheme could be designed to balance the trade-off between SIPS dependability

and security.

a) Voting: the logic solver, upon receiving multiple commands from duplicated

detection systems, is programmed to perform a voting provision. That is, if one of

the systems detects a line-outage, the logic solver will make the trip decision to

initiate the event-based GRS.

b) Vetoing: vetoing logic compares the output signals from multiple systems. The

logic solver needs to validate the decision between the redundant systems prior to

issuing any trip decision. If the output of each system is different from each other,

the system will veto the trip decision, enhancing system security. Therefore,

incorrect SIPS operation due to misinterpretation of inputs or data will be mitigated.

2.3. SIPS: Industry Experience

2.3.1. SIPS Applications and Maloperations

Surveys to investigate SIPS in existence worldwide were conducted by IEEE and

CIGRE in 1989, 1996 and 2009 respectively [14, 17, 19]. The results show a significant

growth in the number of schemes as indicated in Table 2-2. It is apparent that SIPS are

now widely used by electrical utilities as a solution to defend against large disturbances.

Consequently, reliable operation of SIPS needs to be ensured, as failure to achieve

adequate reliability exposes the Power System to additional risks, especially those

resulting from SIPS maloperations.

Table 2-2: SIPS Survey Results

1989 Survey 1996 Survey 2009 Survey

Respondents 18 49 110

Schemes 93 111 958


Page | 42

When operating as designed, SIPS can effectively prevent system degradation during

contingencies. However, due to its critical role in preserving system integrity,

misoperations of SIPS normally lead to severe and costly consequences and this raised

concerns when SIPS was initially implemented. Estimated costs of both operational

failure and unnecessary operation of SIPS were questioned in the 1996 survey. It

indicated that the cost of SIPS failure can be very high, since most of these responding

selected the highest cost category, which is above 500,000 USD. Meanwhile,

unnecessary SIPS operation will also incur a lower cost as compared with failure to

operate, with a penalty between 10,000 to 100,000 USD. Therefore, consideration in the

assessment of SIPS performance should be given in terms of both dependability and

security. According to PRC-004-WECC-1 [18], failure of SIPS can be classified into

two ways:

a) Dependability-based Maloperation (DBM): Dependability-based maloperation is

the absence of a protection system or RAS operation when required. Dependability

is a component of reliability and is the measure of device certainty to operate when

required.

b) Security-based Maloperation (SBM): Security-based maloperation refers to a

misoperation caused by the incorrect operation of a protection system or RAS.

Security is a component of reliability and is the measure of a device’s certainty not

to operate falsely.

The system disturbance reports published by the North American Electric Reliability

Corporation (NERC) were reviewed to identify the root cause of SIPS failures. NERC

has published its findings on system disturbances, demand reductions and unusual

occurrences in the bulk Power Systems in North America since 1979. With a mission to

assure Power System reliability and security, NERC’s area of response covers the

continental United States, Canada, and the northern portion of Baja California, Mexico.

From 1986 to 1995, 24 system disturbances have involved operation of SIPS [20].

Among them, 16 cases were reported as successful operation, while 8 involved

operation failures. The probability of SIPS operation failure was extremely high.


Page | 43

Table 2-3: SIPS Failures Recorded by NERC from 1986 to 1995

Events SIPS Type Main Cause Consequence Date

WSCC-Northeast/Southeast

Separation Scheme

System

Separation Faulty design

1902 MW generation lost

and 253 MW load

interruption

04/04/1988

NPCC-Hydro- Québec Load Rejection Hardware failure System-wide blackout 18/04/1988

NPCC-Hydro- Québec Load Rejection Hardware failure 3950 MW load

interruption 15/11/1988

British Columbia Hydro /

TransAlta Separation

Controlled

opening of lines Arming failure Cascade line outage 07/01/1990

Garrison-Taft 500kV No.1&2

outages

Var

Compensation

Faulty logic

design

119 MW Generator

tripping/25 MW load

interruption

08/01/1990

SE Idaho/SW Wyoming

Outage

Generator

Rejection Hardware failure Cascade line outage 09/12/1991

Pacific AC Intertie

Separation

System

Separation Software failure

Fail to separate system

however no server

consequences

17/11/1991

Minnesota - Wisconsin

Interface 69 kV conductor

burn down

Controlled

opening of lines Wrong settings Two 69kV lines burned 13/10/1992

25 SIPS maloperations were reported during the period from 2000 to 2009 [9], which

was more than 3 times the number of SIPS maloperations from 1986 to 1995. Number

of SIPS has grown significantly hence result is as expected. Among them, 18 cases were

identified as consequence of unnecessary SIPS operation (i.e. SBM), taking up 72% of

the total operational failures, while 7 cases were caused by dependability based

misoperations. A SIPS failure to operate can be caused by hardware failure, software

failure, faulty design logic and human error. Figure 2-4 shows the causes of SIPS

misoperations. Among the recorded SIPS maloperations, hardware failures are the most

common causes, 36% of the total failures. This is normally resulted from physical stress

on the installed components, while the software failure occurrences are caused by

vendor/user written embedded errors, application and utility software. Faulty design

logic may occur as a result of an inappropriate or incomplete system study during SIPS

design. Human errors can be classified based on whether they are associated with

construction, operation, or maintenance.

The SIPS historical maloperation record indicates that majority SIPS maloperations are

SBM. This is due to the protection system was originally designed with a bias on

dependability. Consequently, the SIPS design considerations and operational logics are

further discussed in this paper to effectively balance the trade-off between SIPS

dependability and security. Currently, component hardware failures are the most


Page | 44

common cause of SIPS failures. However, in the future, with the application of more

ICT and IEDs in SIPS application, software is more likely to become the main issues

leading to SIPS failure. A detailed study of SIPS communication architectures will be

provided in the next chapter considering the penetration of modern ICTs.

Figure 2-4: SIPS Maloperations and Causes from 2000 to 2009 NERC Reports

2.3.2. SIPS Reliability Criteria

The 1996 survey indicated most of the reliability criteria for the SIPS designs were

qualitative rather than quantitative. Moreover, some respondents did not use any

reliability criterion to assess the performance of the SIPS. This situation has

significantly changed following the global proliferation of SIPS as well as the increase

in SIPS maloperations. Currently, institutes such as North American Electric Reliability

Corporation (NERC) have developed multiple SIPS reliability standards and assessment

procedures. This includes the description of the system studies that need to be carried

out prior to initial installation and commissioning, the periodic assessment procedures

and the historical SIPS performance data base, etc. A few of the standards are reviewed

[21]:

PRC-004-WECC-1 Protection System and Remedial Action Scheme (RAS)

Misoperation: This is a regional reliability standard, designed to ensure all the

generation and transmission protection systems and transmission related SIPS

maloperations are analysed and mitigated. The following requirements need to be

applied to the Western Electricity Coordinating Council (WECC) RAS: 1) All the RAS

operations and tripped transmission elements need to be reviewed within 24 hours to

analyse the correctness of the operation. 2) If a RAS has a security-based maloperation,

it needs to be removed from the system within 22 hours; for RAS with either DBM or

SBM, it is required to be replaced with a functionally equivalent protection system


Page | 45

(FEPS) within 20 business days. 3) The transmission owners are required to submit

RAS maloperation incident reports to WECC within 10 business days to identify the

main causes of the incident and assist repairs and replacement of the maloperated RAS.

PRC-015-0 Special Protection System Data and Documentation: This standard is to

ensure the proper design and coordination of all the SIPS. It also specifies that all the

maintenance and testing procedures and ensures the maloperations are analysed and

corrected. A database needs to be created and maintained for each RAS installed,

including the flowing information: 1) Contingencies and system conditions for which

RAS is required to operate. 2) The remedial actions taken by the RAS in response to

system contingency. 3) The detection logics and relay settings of the RAS.

Information Required to Assess the Reliability of a RAS Guideline: This document

provides a framework for the Remedial Action Scheme Reliability Subcommittee

(RASRS) to evaluate SIPS. It describes the procedure for periodic SIPS assessment and

the information required for reliability assessment. A RAS review is required prior to

initial installation and commissioning, before significant modifications or extensions,

after failure operation and before removal from service. The periodic assessment needs

to be performed at least every five years for compliance with NERC and WECC

standards.

There are also some international standards which can be quantitatively enforced and

applied to SIPS reliability assessment. The International Society of Automation (ISA)

and the International Electro-technical Commission (IEC) define “Safety Integrity Level

(SIL) as a relative level of risk-reduction provided by a safety function, or it can be used

to specify a target level for risk reduction” [22]. SIL can be expressed as a probability of

failure on demand Pr(DBM) or as risk reduction factors (RRF). Table 2-4 describes the

four SIL levels in terms of Pr(DBM) and RRF, with SIL-4 being the level with the

highest reliability and SIL-1 being the lowest.

Table 2-4: ISA and IEC Defined Safety Integrity Level (SIL)

SIL Availability P(DBM) RRF

4 >99.99% 1E-05 to 1E-04 10,000 to 100,000

3 99.90-99.99% 1E-04 to 1E-03 1,000 to 10,000

2 99.00-99.90% 1E-03 to 1E-02 100 to 1,000

1 90.00-99.00% 1E-02 to 1E-01 10 to 100


Page | 46

Whilst safety integrity level (SIL) is used to evaluate system dependability, spurious trip

level (STL) complements SIL by defining the probability of unscheduled spurious trips

of the system. Table 2-5 shows the range of STL levels, expressed as probability of

spurious operation P(SBM) and spurious trip reduction (STR) values [23]. The higher

the STL level, the lower the probability spurious trips will occur in the system. To

improve SIPS operational performance, in terms of both SIL and STL, can be a

complex process, since any increase in the SIL level may result in a decrease in STL. In

practice, any increase in system redundancy may result in better performance in terms

of system dependability, but worse performance in terms of system security. Therefore,

the SIPS reliability enhancement method needs to be carefully designed to effectively

balance the trade-offs between security and dependability.

Table 2-5: Spurious Trip Level (STL) in terms of P(SBM) and STR

STL P(SBM) STR

x 1E-(X+1) to 1E-X 10X to 10X+1

--- --- ---

4 1E-05 to 1E-04 10,000 to 100,000

3 1E-04 to 1E-03 1,000 to 10,000

2 1E-03 to 1E-02 100 to 1,000

1 1E-02 to 1E-01 10 to 100

2.4. Existing SIPS Applications

This section provides enhanced understanding of the deployment, design and operation

of SIPS by reviewing some existing SIPS applications. The technologies applied in the

sensing, communication and control technologies used in SIPS, and the associated

scheme topologies and reliability enhancement strategies are illustrated.

2.4.1. Dinorwig Intertrip Scheme

The Dinorwig Intertrip scheme as deployed at the Dinorwig pumped hydro station in

North Wales, is designed to preserve the stability of the North Wales supergrid area.

Commissioned in 1984, Dinorwig station is composed of six 330 MVA generators, also

capable of operating as six 312 MVA motors for pumping purpose [24]. The original

purpose of the hydro station was to provide storage capacity, for the excess power

generated by the nearby nuclear stations at times of low demand. In the early 1980’s,

Britain had an excess of base-load nuclear during summer nights. Nowadays, Dinorwig


Page | 47

is operated as short-term operating reserve (STOR) and provides fast response to rapid

changes in power demand (e.g. sudden load pickup) or sudden loss of power stations.

Figure 2-5: One Line Diagram of North Wales Supergrid [25]

The Trawsfynydd-Deeside and Trawsfynydd-Legacy circuits are among the most

critical circuits in the North Wales Supergrid. An outage of either circuit followed by a

fault resulting in the loss of both Deeside-Pentir circuits would leave only one

operational circuit from the North Wales power stations to the rest of GB the system.

This could cause instability at both Dinorwig and the nearby nuclear stations and result

in a high probability of circuit overloading. These system emergencies can be

effectively alleviated by tripping a certain amount of generators or motors at Dinorwig.

Two power-measuring relays, monitoring the power absorbed (pumping) and the power

generated by Dinorwig, are deployed by the intertrip scheme. This ensures the machines

at Dinorwig will not be tripped unless the power generated or absorbed by Dinorwig is

higher than a certain level, which may lead to overloading or instability. The status of

the Deeside-Pentir 1 & 2 circuits is determined by monitoring the associated circuit

breakers, and the line and busbar disconnectors. In addition, activation signals can also

be received from the main protection relays of the two circuits. If both circuits are

inoperative simultaneously, an intertrip signal is transmitted to Dinorwig to initiate the

scheme. When a tripping signal is received by the Dinorwig intertrip scheme, two


Page | 48

machines will be tripped to ensure the remaining power transfer to and from Dinorwig

does not exceed 1250 MW, preserving the stability of the area.

Existing line outage detection methods, as deployed by Generator Rejection Schemes

are now reviewed. Fast detection of a line outage is considered an effective way to

initiate a system integrity protection scheme designed to prevent a Power System

collapsing during severe events. According to the survey conducted in 2010, detection

of line outages can be taken in three main forms, depending on the different levels of

security required by the scheme.

a) Monitoring breaker auxiliary contacts: This is a relatively simple mechanism,

however, could be insecure from two perspectives. First of all, the switch

mechanism of the breaker auxiliary contacts can fail, especially during breaker

routine testing. Secondly, spurious breaker open signals can be unintentionally

generated during transients caused by other control signals. For example, coupling

of the breaker auxiliary contact wiring from other control signals in a cable way can

lead to transients that appear to look like breaker open signals. The transient can be

detected by input-circuit debounce. However, it may also cause significant delay to

the SIPS scheme in detecting breaker open signals.

b) Monitoring breaker status AND “undercurrent” signal: A more secure

mechanism can be implemented using a combination of breaker auxiliary contact

status and current measurements on the line. The zero-current detecting decision

can be performed by most digital relays within half a cycle, resulting in more

secure line outage detection [14].

c) Monitoring protective relay trip signals: This mechanism is used when the speed

of outage detection is paramount. Both the relay trip signals and the breaker failure

outputs need to be monitored by the scheme.

A combination of breaker status and undercurrent signal (i.e. method b) is used for line

outage detection. The line outage detection logic is shown in Figure 2-6. The decision is

made based on a combination of undercurrent (UC) detection on all three phases of line

and the associated breaker open condition. In addition, an appropriate time delay is

added to avoid a fictitious line outage caused by power system transients.


Page | 49

Figure 2-6: Line Outage Detection Logic used in GRS

2.4.2. PacifiCorp’s Jim Bridger RAS

The Jim Bridger Power Plant is located 22.7 miles east of Rock Springs in southwestern

Wyoming and is equipped with four 550 MW generators [26] . The power plant is

connected to the eastern Idaho transmission system via three 345 kV lines. There are

three 345kV/230kV transformers at Jim Bridger and three 230 kV transmission lines

connecting to the Wyoming transmission system. Loss of any transmission line from

Jim Bridger to the Western transmission system (Idaho) will cause overloading and lead

to system instability. In addition, during the fault, the voltage at the generator terminal

will significantly drop and the generator will accelerate. These problems will continue

until the faulted transmission line is disconnected from the system. Once this occurs, the

impedance of the transmission path will increase. The combination of an increase in

path impedance and generator acceleration will cause oscillation between generator

rotors and the Power System, which may lead to a generator out of step condition and

voltage swings at the Jim Bridger substation. Without the RAS, or when the RAS is not

in service, the output of Jim Bridger is restricted to 60% of its capacity. The Jim Bridger

RAS is therefore required to maximize the power transfer on the existing transmission

network and protect against dynamic stability problems. The following control

functions are performed by the RAS [27]:

Generator tripping:

- Arming level calculation

- Generation tripping requirement calculation

- Selection of units to trip

Series capacitor bypass control at Burns 525 kV reactive station (capacitor provides

30% compensation on the Midpoint to Summer Lake 525 kV line)

Shunt capacitor bank insertion at Kinport 345 kV and Goshen 161 kV

Permission for line series capacitor insertion at the Jim Bridger 345 kV


Page | 50

- Permission from Jim Bridger for Lag segment (1/3 of the total installed series

compensation) insertion at each 345 kV capacitor.

- Activation of subsynchronous resonance (SSR) protection for the generating

units at the Jim Bridger Power Plant.

Figure 2-7: Geographic Overview of PacifiCorp’s Jim Bridger Transmission System [26]

Due to the critical role of Jim Bridger RAS in preserving system stability, redundant

“input, output and processing” units are required to enhance the dependability of the

scheme. However, under most system conditions, the tripping of a 530 MW unit is not

required especially if the load level is low and the fault only involves a single-line-to-

ground. Therefore, enhancing security against the false operation is critical in reducing

the operational costs. Consequently, a triple modular redundant (TMR) programmable

logic controller with two-out-of-three voting logic is deployed by the RAS as shown in

Figure 2-8. Within each RAS system, there are three identical systems gathering the

input/output (I/O) data. These perform two-out-of-three voting on the status and the

calculations. This process confirms if an action is required. A total loop time of less

than 17 milliseconds is provided by the TMR system.


Page | 51

Figure 2-8: Jim Bridger RAS Triple Modular Redundant (TMR) System [26]

2.4.3. Southern California Edison Centralised RAS

Due to the various green initiatives and renewable portfolio standards (RPS) mandates,

generation interconnection requests to the power grid have escalated dramatically in

recent years. A proliferation of RAS/SIPS solutions are now expected within the

Southern California Edison (SCE)’s service territory to economically accommodate

more renewable generation. However, due to an increasing number of RAS being

installed, SCE is now facing the challenges brought by the standalone nature of existing

SIPS implementations. The limited communication capabilities of RAS, especially to

other part of the system, along with laborious maintenance and test practices,

significantly impede SCE’s ability to deploy the large number of new RAS required to

satisfy all the generation connection requests.

A breakthrough solution has been raised by SCE, that effectively centralises all the

existing standalone SIPS and the monitoring and protection functions to achieve better

RAS coordination and maintenance. The validation of the Centralised RAS (CRAS) is

enabled by advanced field intelligent electronic devices (IEDs), fast computing

controllers, SCE’s extensive wide area fibre communication networks and the

continuously developing communication standards.


Page | 52

Figure 2-9: The Existing and Forecasted RASs in SCE’s Service Territory [28]

From the advanced Information Communication Technologies (ICT) point of view,

several breakthroughs have been achieved by the CRAS: An Intel-based modern

computer, which is capable of IEC 61850 communications, is used as central controller

instead of programmable logic controllers (PLCs). This provides faster computing

capabilities. SCE has more than 7000 miles of high-speed communication network,

which makes fast wide area communication possible. The IEC 61850-8-1 Generic

Object Oriented Substation Event (GOOSE) messaging [29] was chosen as the transport

mechanism and used for data transmission across the LAN and WAN in an IEC 61850

format. IEC Technical Report 61850-90-5 [30] provides details of a communication

protocol for event-driven GOOSE message, designed to extend its application from a

LAN to a WAN. The WAN comprises of dual-redundant T1 and Ethernet data

communication links. This is the first attempt to apply IEC 61850 over a large scale

wide area that involves monitoring and protection. The GOOSE message needs to be

encoded to ensure its security over WAN communication and reduce the vulnerability

related to cyber security. The Group Domain of Interpretation (RFC 6407 - GDOI) can

be used to provide symmetric keys to secure data signing and encryption [30].

The overall communication network layout of the centralised RAS is shown in Figure 2-

10. As required by the SIPS design guidelines, full redundancy of the CRAS is provided


Page | 53

by duplication of the control centre, the monitoring and mitigation relays and the

communication network. Each substation uses redundant relay sets, and implements

redundant and diversely routed telecommunication circuits to the control centres. The

Central Controllers of the CRAS are designed with triple modular redundancy (TMR)

and is installed at geographically separated locations (i.e. Grid Control Centre (GCC)

and Alternate Grid Control Centre (AGCC)). The controller at each site has an active

triple-redundancy controller, as well as a hot standby backup. Redundant Ethernet

communication links between the two controllers are provided to exchange the

information acquired from the substations. The CRAS also has interface with the SCE’s

Energy Management System (EMS) for data communication as well as model mapping.

Figure 2-10: SCE CRAS High-level Network Architecture [28]

The CRAS provides a platform for the system operators to migrate the “situation

awareness” monitoring to actionable grid control and protection strategies. Currently,

most system protection schemes are designed based on predetermined seasonal, off-line

and pre-planned mitigation strategies. With the CRAS platform, the on-line dynamic

and hierarchically layered control function can be developed to leverage the capability

of the protection schemes. This will be further discussed in the following chapters.


Page | 54

2.5. Review of Major SIPS Maloperations

The following analysis of SIPS maloperations provides a better understanding of the

SIPS failure mechanism. It demonstrates the impact of SIPS maloperation and

unintended SIPS interactions on the propagation of system disturbance.

2.5.1. Irish System Disturbance, 5th August 2005

The Irish system disturbance caused a temporary loss of supply to 326,000 customers in

Ireland and a further disconnection of 74,000 customers in Northern Ireland. The

System Separation Detection Scheme and the Moyle Run-back Scheme are both

relevant to the development and spreading of the disturbance.

1) System Separation Detection Scheme:

The Ireland and Northern Ireland system are interconnected by the Louth-Tandragee

275 kV interconnector and two smaller 110 kV interconnectors between Letterkenny

and Strabane and between Corraclassy and Enniskillen. Due to the limited transmission

capacities of the two 110 kV interconnectors, the system separation scheme is designed

to trip these two interconnectors after detecting the loss of the main 275 kV circuit.

2) The Moyle Run-back Scheme

With the advent of Moyle direct current link between Northern Ireland and Scotland, the

system separation detection scheme is also used to command a change in flow on the

Moyle interconnector following a loss of the main Ireland-Northern Ireland

interconnection.

The run-back scheme is designed to alter the power flow on the Moyle interconnector. It

is used to prevent excess power in the Northern Ireland system resulting from loss of

interconnection to Ireland. The aim is to maximise the capacity for power flows on the

North-South Interconnector.

Before the incident, the All-Ireland Power System was operating normally with a total

demand of 3,302 MW, 377 MW of which was served by an import from Northern

Ireland. Additionally, the operating reserves were normal and the generating plant

availability was more than sufficient at 4,880 MW. However, several circuits were out

of service for maintenance, including one of the most critical lines, the Louth-Tandragee


Page | 55

No.1 275 kV circuit. This line is half of the interconnector to Northern Ireland and its

outage had a significant impact on the system disturbance.

Figure 2-11: The Ireland Transmission System Map [31]

At 10:22, an inter-trip signal was detected by the System Separation Detection Scheme

at Tandragee substation on the Louth-Tandragee No.1 275 kV circuit, which was out for

maintenance. The false detection of this signal was reported to be caused by radio

interference. The signal was sent to instruct the Moyle DC interconnection to Scotland

and triggered a run-back, which reversed an import of 115 MW to 168 MW export and

dropped the frequency to 49.52 Hz. In addition, the signal was also transferred to

Enniskillen and Strabane and tripped the two standby 110 kV interconnectors. At 10:24,

a second run-back was incorrectly triggered on the Moyle Interconnector, increasing the

power export to Scotland from 168 MW to 416 MW and the frequency dropped to

48.82 Hz, which is lower than the 48.85 Hz trip frequency of the 1st stage of the under-

frequency load-shedding scheme. This was caused by the two-minute timer in

Ballycronan More which triggered after it timed out, this occurred because it was still

monitoring the original inter-trip signal in Tandragee. Both interruptible and normal

tariff customers were automatically disconnected. In addition, two generator units (i.e.

Tarbert unit 4 and Moneypoint unit 1) were tripped due to the low frequency after the

second run-back. This led to further under frequency load shedding, 15.6% of the south

of Ireland system demand and 11.4% of the Northern Ireland system demand. To help

recover the system frequency, the power export to Scotland was reduced from 416 MW

to 250 MW by blocking 50% of the Moyle interconnector. Finally, most of the


Page | 56

customers were reconnected by 12:00, i.e. 98 minutes after operation of the System

Separation Detection Scheme.

Figure 2-12: Frequency change during Irish Disturbance on 5th August 2005 [10]

The performance of the remedial protection schemes in the formation of the Irish

system wide disturbance was then analysed. The primary cause of the system

disturbance was due to the incorrect detection of a separation of two Power Systems on

the island. This highlighted the importance of immunity to the interfering signals. It is

also recommended that the outputs form the Power Line Carrier on the out-of-service

circuits should be blocked to prevent receipt of spurious activation signals. To help

prevent this incident occurring again, the inter-trip signal latch has been removed from

Tandragee to Ballycronan More, which ensures a more secure communication and

prevents the second run-back signal for the Moyle Run-back scheme. The performance

of an under-frequency load-shedding scheme is still considered necessary in preventing

system collapse.

2.5.2. SIPS Maloperation in Nordic Grid, 1st of December 2005

Two system protection schemes are installed in Norway to deal with the challenges

brought by the high power generation from the northwest of the main bottleneck. A

brief introduction of the schemes is given:

1st Run-back

2nd Run-back


Page | 57

1) Nordland SIPS:

The grey-shaded area (northern Scandinavia) in Figure 2-13 contains nearly 15% of the

installed hydroelectric capacity (i.e. approximately 6000 MW) in the Nordic grid, but

has a low load demand. This leads to a large power transfer from north to the south,

where the main load centre is located (Oslo). The outage of any critical transmission

corridor may cause the overloading of other transmission lines. In this case, the

Nordland SIPS is designed to shed up to 1200 MW generation in the north, and split and

disconnect the northernmost part of Norway from the main Nordic grid if there is

surplus generation in the northernmost part of Norway.

2) Østland SIPS:

The eastern part of Norway around Oslo, shown in the yellow-shaded region in Figure

2-13, is the main load centre. The Østland SIPS is activated when there is an outage or

overload of the central lines in the Oslo area. In this case, it is designed to shed up to

1200 MW of generation on the west coast of Norway, protecting the remaining

transmission line in Oslo from overloading.

Before the incident, the Nordland scheme was armed due to the high hydroelectric

production at the northwest of Nordic Grid (2300 MW out of the area). Both network

split and generator tripping functions were activated. In addition, the Østland SIPS was

activated due to high power transfer from Norway to Sweden (2100 MW). At 15:02 on

the 1st of December 2005, a fault occurred on a 420 kV reactor and then the breaker

failed to open. The operation of a busbar differential protection at Porjus cleared the

fault. This also led to the outage of one main transmission circuit from Northern

Scandinavia through Sweden. The remaining power transfer corridors out of northern

Scandinavia then became overloaded, which means the operation of the Nordland SIPS

was required. However, the delayed operation of the Nordland SIPS led to a series of

cascading events. System operation outside the operational limits required the activation

of the second SIPS at Østland, which was expected to trip 1150 MW production on the

west coast of Norway. The operation of the Østland SIPS would cause a frequency drop

and eventually trigger the under-frequency load shedding of 2400 MW of load.

Fortunately, the Østland SIPS also failed to operate, prevented the breakdown of the

entire Nordel grid. Finally, the manual activation of 2000 MW fast reserve production

helped stabilize the system disturbance.


Page | 58

Figure 2-13: Nordic Grid and the Protection Schemes [11]

The latency in the Norland SIPS was due to the delay in the substation communication.

The failure of the Østland SIPS to operate was due to a human error after maintenance

testing, which saved the Nordic grid from collapse. The overload limit of the Østland

SIPS was increased after the incident to ensure it won’t trigger in the similar case. This

event indicated that in the SIPS design phase, system studies must be performed to

ensure the SIPS arming criteria is correctly designed. In Chapter 6, a method to

determine the arming point of a SIPS is provided by comparing the system risks with

and without SIPS. The Nordic SIPS maloperation also highlighted the importance of

reliable SIPS operation in preventing the spread of a system disturbance. In addition,

Østland SIPS

Norland SIPS


Page | 59

with increased number of SIPS in the system, the relationship between multiple SIPS

needs to be studied to mitigate undesirable interactions. A risk assessment method to

evaluate the impact of undesirable interactions between different SIPS will be discussed

in Chapter 7. In particular, SIPS with adaptive operational logic will be designed to

effectively mitigate the risk caused by SIPS interactions.

2.6. Summary

This Chapter describes the fundamental features of SIPS and the need to evaluate its

reliability. By reviewing the surveys and implementation guidelines, some critical

design considerations and the application of the latest technology for SIPS are described.

SIPS can be applied to provide corrective control actions for various abnormal system

conditions to preserve system integrity. It is therefore vitally important to ensure a

highly reliable SIPS performance in terms of dependability and security. In recent years,

the proliferation of SIPS and the increased incidents of SIPS maloperations both call for

an effective method to quantitatively assess the additional operational risks brought by a

SIPS implementation to the overall Power System.

The review of the existing SIPS applications in this chapter covers industry practices

and approaches to the use of new technologies for monitoring, communication and

control to further enhance SIPS performance. SIPS can be implemented either locally or

system-wide with hierarchical architecture and multi-level corrective actions. The

introduction of advanced information communication technology (ICT) and intelligent

electronic devices (IED) brings more flexibility in SIPS design and operation. By

reviewing SIPS related system disturbances, it can be seen that component hardware

failures are the most common cause of SIPS maloperations. However, in the future,

with the application of more ICT and IEDs in SIPS application, software is more likely

to become the main issues leading to SIPS failure. According to NERC, with 72% of the

recent SIPS maloperations caused by security-based maloperation, the implementation

of redundancy in SIPS communication network needs to be carefully assessed. The

method of using voting and vetoing scheme to balance the trade-off between scheme

dependability and security has been illustrated in the TMR designs. The extensive wide

area communication network, together with the GPS time synchronisation, made it

possible to centralise the decision making in the existing distributed SIPS and achieve


Page | 60

better management and coordination. A more detailed description of the ICT used in

SIPS applications will be provided in Chapter 3.

Major SIPS related system disturbances illustrate the significant consequences of a SIPS

maloperation. In the pre-cascading phase, an effective and quick operation of SIPS is

vital in preventing the spread of the disturbance. Incorrect, delayed or failure of SIPS

operation increase the probability of the system entering the cascading phase and may

eventually lead to severe consequences such as load disconnection. The Ireland

disturbance indicated the importance of secure communication in successful operation

of SIPS. Redundancy in SIPS activation signals made available by duplicated

communication channels may adversely reduce its reliability and increase the risk

caused by spurious SIPS operation. Consequently, the implementation of redundancy in

SIPS design need to be assessed and appropriate voting or vetoing logics can be used to

enhance SIPS security. In addition, both of the reviewed system disturbances involve

the operation of more than one protection scheme, which highlighted the necessity of

not only assessing the reliability of individual SIPS performance but also understanding

the possible interactions between SIPS.

Page | 61

CHAPTER 3

ASSESSING THE IMAPCT OF ICT

RELIABILITY ON SIPS APPLICATION

3.1. The Role of ICT in Power System Protection

The effective and reliable provision of electrical power is increasingly reliant on

information and communication technology (ICT). The massive expansion of these

technologies in the Power System allows the system operator to respond to the danger

caused by an abnormal system condition in a more effective and timely manner. This

helps to prevent the propagation of large system disturbances [32] and has been widely

applied in the design of SIPS as introduced in Chapter 2. Moreover, the increasingly

varying operating conditions arising from the integration of significant renewable

sources and the highly interconnected network makes it extremely difficult to retain

system stability without advanced detecting, monitoring and visualization methods, as

enabled by ICT.

In addition, ICT brought more flexibility in the conventional protection function and it

significantly facilitates the implementation of wide area monitoring, protection, and

control (WAMPAC) system. This helps promote the development of novel protection

approaches, such as special protection schemes, real-time control of HVDC and FACTS,

Chapter 3: Assessing the Impact of ICT Reliability on SIPS Application

Page | 62

and stability monitoring. The processes included in conventional protection and control

strategies are normally straightforward, especially because they do not include system-

wide supervisory control to monitor and regulate possible failures in the related fault

evolution process. Recent blackouts have also emphasised the limitation in the local

protection and the need to enhance system reliability by developing WAMPAC systems

[12, 13]. Consequently, the implementation of ICT and WAMPAC will continuously

help enhance the stability and efficiency of the future Power System.

3.1.1. Impact of ICT on Power System Protection

The role of conventional Power System protection is to disconnect the faulty or

overloaded elements from the rest of the electrical network. However, changes in Power

System are making it increasingly difficult to find one appropriate protection setting

that will suit all the different system conditions and operating contingencies. Meanwhile,

Power Systems are becoming more vulnerable to system wide disturbances and this

requires a coordinated wide area response across the entire system. The WAMPAC

system, made available by advances in ICT, is expected to improve the performance of

the protection system and especially the following aspects:

1) Managing wide area disturbance: the introduction of wide area monitoring in the

protection system improves the resilience of Power Systems against stressed

conditions and wide area disturbances. The increasing availability of real-time wide

area measurements has enabled SIPS to be applied with fast and adaptive protection

actions for system contingencies. Another example is the use of a WAM system to

supervise back-up protection (i.e. Zone3) within a distance relay; this was identified

as a main contributor to recent blackouts [12, 15]. The backup zones of distance

relays can be entered during extreme system conditions such as changing loads,

power swings and generator loss-of-field. The use of WAMPAC could improve the

performance of the backup protection by restraining the backup relays in the event

of a load swing. As illustrated in Figure 3-1, when the Zone-3 protection of relay A

is picked up, and significant negative sequence current is detected, the zone-3

pickup decision is appropriate and necessary. However, if the currents in the line are

balanced, the zone-3 pickup could be caused by either a balanced three-phase fault

or a load violation. In this case, the remote PMUs installed within the protection

zone of the backup relay will monitor the current and determine whether there is a


Page | 63

Zone-1 three-phase fault. If none of them detects the existence of the three phase

fault, the zone-3 trip of relay A must be caused by the load encroachment and the

trip decision will be declined.

Relay at A

BC

D

E

Zone 3 of Relay A

Zone 1 of Relay C

Zone 1 of Relay B

Zone 1 of Relay D

Zone 1 of Relay E

Relay A Zone 3 Trip

Relay at B

Relay at D

Relay at C

Relay at E

Zone1 Pickup?Yes

Block Relay A Zone 3

No

Supervision of Zone 3 of Relay A

Relay A Zone 3 Trip

Figure 3-1: Supervision of Backup Relays to Prevent Zone3 Maloperation

2) Mitigating the impact of hidden failures: A hidden failure is defined as a

permanent, undetected defect in a protection relay which causes a relay to operate

incorrectly and remove elements of the system as a consequence of another

switching event in the system [33]. The impact of a hidden failure can be tackled by

using ICT to collect measurement from multiple relays and use the information to

confirm or approve the trip decision. This could prevent a single hidden failure in

any one of these relays from causing an incorrect operation.

3) Adaptive relaying: The concept of adaptive relaying is that the protection devices

can automatically make adjustments to make them more attuned to prevailing

system conditions. An example of an adaptive relaying application is a design that

balances the dependability and security of a protection scheme. Power System

protection was traditionally designed with a bias towards dependability [34], which

could be beneficial in a system with a robust transmission network and sufficient

generation reserve. However, during a wide area disturbance, the erroneous loss of

an unfaulty element can be a major threat to a stressed system and can accelerate the

process leading to a cascading failure and even a blackout. Consequently, the

preference for dependability may result in inappropriate tripping operations and

bring greater risks to the system. Therefore, the shifting of the balance from

dependability towards security during stressful system conditions, as detected by the

wide-area monitoring system, could be attractive.


Page | 64

3.1.2. Impact of ICT on SIPS

A SIPS protects system security from the effects of extreme system contingencies and

wide area disturbances, which are beyond the scope of traditional protections. In recent

years, some more advanced SIPS, based on the real time wide area measurements made

available by WAMS, are proposed to protect the Power System from wide area

disturbances under various system conditions [35-37]. The advanced ICT enables real

time monitoring of system conditions and provides a more accurate state estimation to

facilitate the decision making of SIPS. With the phasor measurement units (PMU) and

the global positioning satellite (GPS) system, the time tagged measurements of Power

System quantities across the entire network can be collected to provide opportunities for

system wide SIPS control actions. The synchrophasors can be applied to solve system

stability problems such as oscillatory stability, voltage stability and transient stability.

In addition, the system event triggering can also be provided by the synchrophasors

based on the measurement of current, voltage, frequency and the rate of change of these

measurements.

There are some applications of using WAMPAC to enhance SIPS performance in the

industry. Some utilities use the WAMPAC system to centralise the existing standalone

SIPS to achieve better coordination and easier maintenance [28]. Others utilise real time

wide area measurement to more precisely predict the complex emergency operational

states and adaptively adjust to ensure a quick and decisive response [35, 36].

The building of an extensive telecommunication network and the continuously

developed communication protocols enables the exchange of information over Wide

Area Networks (WAN). For example, the GOOSE message, which was originally

intended for communication within a local area network (LAN) environment, is now

implemented for wide area protection and control applications. As described in IEC

61850-90-1 [38], a special router configuration is used to tunnel the GOOSE among

substations or between substations and the control centre. This protocol provides a

secure transfer of GOOSE across the WAN in an IEC 61850 format by using a special

router configuration to tunnel the GOOSE messages between the substations. The use of

routable-GOOSE (R-GOOSE) is an emerging solution to improve wide area Power

System monitoring, protection and control and achieve a centralised SIPS application.


Page | 65

3.2. Communication Infrastructure of SIPS

3.2.1. General SIPS Communication Infrastructures

An overview of the SIPS communication infrastructure, together with its reliability

considerations is first discussed in this section. SIPS can be classified into local SIPS

and system wide SIPS, based on its communication infrastructure as discussed in the

previous chapter. Most of the existing SIPS implementations are local schemes and are

distributed in isolated local substation environments with limited communication with

other parts of the system. Therefore, all its sensing, communication and decision

making devices are located in a single substation [14], making the protection action

highly reliant on the local substation automation system (SAS).

With the development of ICT, SIPS are now being designed to address system wide

contingencies, which require measurements from all over the network. In addition,

utilities are now facing more complexity in SIPS operation, following the increase in

the number of SIPS being installed in the power networks for hierarchical control

actions. The standalone nature and the widespread proliferation of the SIPS require an

extensive control and maintenance effort. This may also lead to a higher probability of

unintended SIPS interaction and may impede the ability to deploy additional SIPS into

the power network to enhance its capability and accommodate more renewables. All of

these challenges in the development of SIPS call for a breakthrough solution. This is to

centralise the current existing standalone SIPS to achieve better development,

coordination and maintenance. However, this breakthrough is based on advanced fast

detecting IEDs, fast computing processors, the availability of extensive Wide Area

Networks (WAN) and continuous development of communication protocols (e.g.

IEC61850, IEC62439, etc.).

The implementation of SIPS with a significant degree of centralisation using the WAN,

have been completed by some utilities. The PacifiCorp’s Jim Bridger RAS [26]

implements a dual triple modular redundant (TMR) system to initiate their region-wise

RAS applications. Information from the neighbouring RAS, located at the Idaho Power

Midpoint Substation, is centralised by the Jim Bridger RAS. The two neighbouring

SIPS system are designed to complement each other with intertripping and coordination.

A wide area SIPS was installed at the Salt River Project (SRP) system [39] to centralise


Page | 66

the measurements of generation outputs and implement the load shedding calculation

using GOOSE messages, a Virtual LAN (VLAN), and priority messaging technologies

enabled by IEC 61850. South California Edison (SCE) [28] is now developing a

centralised RAS (CRAS) which centralises the arming calculations and the RAS logic

over a wide area monitoring and protection network involving over 100 substations.

This platform establishes a platform for central controlling.

Substation #N (SIPS X)

Substation #1 (SIPS A)

Substation #1 (SIPS B)

MU

BIEDBIED

MU

Relay A

Relay B

Switch

Router

WAN

LAN

WAN

Centralized Control Center B

Centralized Control Center A

LAN

Controller A

Processor

A2

Processor

A1

Processor

A3

Gen Plant/Load Sub #N(SIPS X)

Gen Plant/Load Sub #1 (SIPS A)

Control

IEDs A

LAN

Gen Plant/Load Sub #1(SIPS B)

Router

Switch

Router

Control

IEDs B

BIEDBIED

Load or Generation

1. Monitoring and Detection

Line flow monitoring

Load level monitoring

Line outage detection

2. SIPS Central Controller

Arming Calculation & Logic checks

Mitigation Calculation

3. Mitigation

Generation/load level monitoring

Generator tripping

Load Shedding

Figure 3-2: General SIPS Architecture with Central Processors

To realize the centralisation of existing distributed and standalone SIPS, an extensive

high speed communication infrastructure between the local substations and the system

wide communication network is required. Figure 3-2 describes an overall

communication architecture of a SIPS installed for centralised control, with redundancy

applied to the monitoring devices, central controllers, mitigation relays and the

associated communication networks. The monitoring and detection IEDs are located at

substations spread over the entire network and used for different monitoring

applications. The monitor function involves the measurement of power flows on the

lines, and the voltages, frequencies, rate of change of frequency and when applicable

other parameters related to specific system conditions. The centralised SIPS controller

gathers all the monitoring and protection data via a wide area network (WAN) and

decides whether corrective action is required. The physical architecture of its

communication network is determined by the size of the scheme and the location of the

detection sites and the mitigation actions. Once the SIPS operation is required,


Page | 67

commands must be sent to the control IEDs in the field to initiate corrective actions (e.g.

generator tripping, load shedding, etc.).

3.2.2. Wide Area Communication Network

The WAN is used to gather all the information required for SIPS decision making from

the various detection sites and to communicate the control commands to the mitigation

devices at different locations in the system. The Synchronous Optical Network (SONET)

and Synchronous Digital Hierarchy (SDH) protocol based architectures are normally

used by utilities for the wide area communications between major substations in their

power network. SONET and SDH are standardized protocols that transfer multiple

digital bit streams synchronously over optical fibre. The fibre-based communication

system is normally built in a ring configuration as shown in Figure 3-3. Redundant

communication paths are provided by its bi-directional data ring topology. Data

exchange between WAN nodes mainly relies on the primary ring of the dual ring WAN.

Occasionally, upon losing the primary ring, SONET equipment can switch to the

backup data flow in as little as 4 milliseconds [40, 41].

SONET/SDH

Node 3

Node 4

Node 1

Node 2

Primary Ring

Backup Ring

FI 12

FI 23FI 34

FI 41

FI 14

FI 43 FI 32

FI 21

Control Centre

Substation #3

Substation #2

Substation #1

Figure 3-3: WAN SONET Architecture

The control centre for the system wide centralised SIPS, collects data from all the major

substations across the network and processes all the logic. Since the correct and timely

response of the centralised SIPS is critical to the stability and reliability of the large

scale wide area power networks, the central controller must be highly dependent. For


Page | 68

this reason, redundant inputs, outputs, processor units and telecommunication networks

must be implemented. This is because unexpected remedial actions caused by spurious

SIPS operations could involve significant costs. To balance the dependability with

security to achieve the optimal performance, dual triple modular redundant (TMR)

voting control systems are developed by utilities as shown in Figure 3-2. The two

controllers are installed in geographically separated locations and will backup each

other. In each central controller, there are three processors. Two of them must achieve

the same decision to initiate an operation. There are also Ethernet links between the two

systems for data exchange.

3.2.3. Substation Automation System

A Power System substation, as a key node in a power network, plays a vital role in

monitoring and controlling power flows and interconnecting generating facilities,

transmission and distribution networks and customers. Successful operations of both

local and system-wide SIPS are heavily reliant on the monitoring, communication and

control functions in the substation automation system. A substation consists of

numerous items of switchgear and measuring devices, and these are controlled,

supervised, and protected by the Substation Automation System (SAS). The main

features of the SAS are to [42]:

Control or monitor all the electrical equipment in a substation

Communicate to remote SCADA system

Control or monitor electrical equipment in a local bay

Monitor the status of all the connected substation automation equipment

Monitor the condition of substation electrical equipment (e.g. switchgear,

transformer, relays, etc.)

Manage the energy flows

The successful operation of SIPS is heavily reliant on the instrumentation, monitoring,

communication, control and protection systems used in the SAS. The measurements

required by SIPS are collected by substation based sensor IEDs and are then transmitted

to a station host computer. The data are then used for either local decision making or

sent via a WAN for centralised decision making. The communication infrastructure of

the SAS and its reliability are studied in the following sections.


Page | 69

The advent of IEC 61850 [43] significantly facilitates the communication services in a

substation and overcomes problems of interoperability between different devices. The

development of SIPS, which is one type of special protection and control system, will

be influenced by fast and highly reliable communications, as provided by an IEC 61850

based SAS. Data transmission in a substation, in accordance with IEC 61850, is based

on the necessary data model and communication services. The use of IEC 61850

significantly improves the reliability of the SAS by replacing a multitude of copper

wires with serial communication links (e.g. fibre optic) [44]. Successful

communications in SAS also rely on a reliable physical Substation Communication

Network (SCN) architecture. Redundancy in each communication layer is required to

eliminate single-point failures in the system.

Figure 3-4: Substation Automation Architecture from Hardwire to IEC 61850 [45]

3.2.4. Centralised SIPS: Speed Requirement

To achieve the centralisation of the distributed schemes, a high speed broad-band

communication network is required to deliver the measurement information and the

control signals. The time requirement for SIPS varies with different applications. For

SIPS designed to mitigate the overloading in transmission systems, the operating time

requirements can be several minutes. However, for SIPS designed to improve transient


Page | 70

stability, a timeframe of 100 milliseconds is normally required. Therefore, the overall

speed for the mitigation action must be fast enough to satisfy the application

implemented in the SIPS with the harshest time requirement. By reviewing the current

SIPS time requirements and stability studies applied to the most severe faults [14, 26],

the total time from the triggering event to the SIPS actions must not exceed 5 cycles.

Therefore, for a 50 Hz system, a timeframe of 100 ms was set as the time requirement

under the most severe conditions. A time allocation for the timeframe is illustrated in

Figure 3-5.

Normal Operational Conditions

Stability with robustness

DegradedConditions

Stability without robustness

Preventive actions

TriggeringEvent

SIPS actions (required timeframe)

Instability

Evolution of the Collapse

Monitoring relay processing: 16ms

Data transmission to controller: 19ms

Central Controller Processing: 15ms

Data transmission to mitigation device: 19ms

Mitigation relay processing: 5ms

Trip contacts: 25ms

Figure 3-5: Time Breakdown of a Time-Critical SIPS Application

The breakdown of the SIPS time frame includes 16 ms for the detection relays to detect

the fault, 15 ms for the decision making in the central controller and a total of 30 ms for

the mitigation relay to trigger remedial action [46]. A time interval of 19 ms is left for

the data transmission between the relay and the control centre or vice-versa. Great effort

has been made by utilities to test the speed performance of the wide area

communication network used for centralised SIPS. Testing results indicate that the

communication speed of the WAN is fast enough for SIPS applications. For example,

the SCE uses routable-GOOSE messages for wide area communication. The testing

results indicate a 19 ms time interval for bi-directional data transmission is sufficient for

data transporting over 660 miles communication network (far enough to cover most

remote locations of the SCE’s service territory) [46]. This leaves sufficient time margins

for other possible delays in the communication process.


Page | 71

The communication speed within a SAS is also critical in satisfying the speed

requirement of each SIPS application. By reviewing the IEEE standard communication

delivery time performance requirements for electric power substation automation [47],

the time requirements for different substation automation applications are listed in Table

3-1. Therefore, for the information with the highest time-critical level, a communication

time of ¼ cycle (i.e. 0.005s) can be achieved by the SAS. Consequently, it has been

proved that using an extensive high speed wide area communication network, together

with fast computing central controllers and high speed substation automation system

communication fulfils the time requirement for different SIPS applications.

Table 3-1: Grace Time for Substation Automation Systems

Applications Typical grace time

Uncritical automation applications, e.g. enterprise resource

planning, manufacturing execution 10s

Automation management, e.g. human interface, SCADA,

building automation, thermal 2s

General automation, e.g. process & manufacturing industry,

power plants 0.2s

Time-critical automations, e.g. synchronised drives, breaker

failure protection, back-up breaker tripping, etc. 0.005s

3.3. IEC 61850 based Substation Automation System and its Reliability

Model

A highly reliable, fast and deterministic communication network is vital for the

successful execution of a SIPS application. The penetration of ICT brings significant

changes in instrumentation, monitoring, communication, control and protection systems

in the Power System. With more hardware devices, software routines and user defined

settings, concerns about the reliability of digital communication dominated protection

and control systems have been raised. In particular, the application of the IEC 61850

substation automation system and microprocessor-based multi-function IEDs provide

more flexibility in system design and opens up a vast range of solutions for SAS

architecture [48]. Different SAS communication architectures, implemented in

accordance with various communication protocols, are reviewed in this section. A

method to quantitatively evaluate the reliability of different communication services in


Page | 72

the SAS is proposed. The method could help ensure the reliability levels required by

various SIPS applications are achieved.

3.3.1. IEC 61850 based Substation Station Bus Architectures

In general, SAS is a hierarchical structure comprising three levels, namely the station

level, the bay level and the process level. These three levels are connected using two

buses: the station bus and the process bus. The station bus facilitates the communication

between the protection, control and monitoring IEDs installed at the bay level with the

station level devices, such as the station computer with the human machine interface

(HMI) and gateway to the network communication centre; whilst the process bus

connects the bay units (i.e. protection and control devices) with the switchyard devices

(e.g. breakers, CTs and isolators, etc.). Multiple network redundancy protocols can be

implemented to enable communication network reconfiguration and self-healing of the

communication path in case of device or link failures.

3.3.1.1. Star & Ring Station Bus Architectures

Figure 3-6 shows two typical substation communication network (SCN) architectures:

star and ring. For the star architecture, a central station-level switch is used to connect

all the bay switches to the IEDs allocated in each bay. Ethernet switches provide a

common connection point for devices by storing incoming packages and forward them

to the specified destination on the LAN. In this case, the central switch becomes a single

point of failure for the whole SCN and thus significantly affects the reliability of the

station bus communication network. Communication redundancy can be achieved by

using the ring architecture, which involves forming a ring of switches. The Rapid

Spanning Tree Protocol (RSTP) as defined in IEEE 802.1w can be integrated into the

ring structure to prevent communication loops which may cause flooding due to data

duplication and recirculation. The RSTP protocol automatically readjusts to failure, by

sending data to its destination in the opposite direction upon detecting a break at one

point of the ring. This helps to achieve the so called “standby” or “dynamic”

redundancy. When the primary path failure is detected in the RSTP ring, the alternative

standby path needs to be switched into action within a certain amount of operating time,

which is called the reconfiguration time. A typical reconfiguration time of 2s can be

provided by the RSTP ring architecture. This is more rapid than the conventional


Page | 73

Spanning Tree Protocol (STP), which has an average switchover delay of 30sec in the

event of a failure [49].

Figure 3-6: Star (left) & Ring (right) Type SCN Architectures

Standby redundancy provided by STP and RSTP requires switchover time when the

primary path fails. However, the advent of the high-availability Seamless Ring (HSR)

protocol as standardised in IEC 62439 [50], bumpless redundancy for Ring topology

networks can be provided. A simple HSR network is indicated in Figure 3-7. A Doubly

Attached Node running HSR (DANH) simultaneously send duplicated multicast frames

(i.e. A & B frame) to the recipients on the network. If one of the communication paths

fails, the destination can still receive the signal from the other communication path

without any reconfiguration time.

Figure 3-7: Example of IEC 62439-3 HSR Network [42]


Page | 74

3.3.1.2. Parallel Redundancy Protocol based Station Bus Architectures

The concept of the Parallel Redundancy Protocol (PRP) as standarised in IEC 62439-3

[50] is to connect the IEDs with two separated and independant Local Area Networks

(LAN A & LAN B) and to simultaneously send the duplicated Ethernet packets through

these two networks. Consequently, if one data frame fails to reach the destination due to

traffic or network failure, the destination can still receive the required information from

the other network without any reconfiguration time, hence providing seamless

redundancy. Figure 3-8 shows the redundant double star and double ring station bus

architectures, implemented in accordance with the IEC 62439-3 PRP. Noting that the

IEDs which are Singly Attached Nodes (SAN) can be connected to the PRP networks

via a Redundancy Box (Redbox).

Figure 3-8: Redundant Double-Star (left) & Double-Ring (right) SCN Architectures

3.3.1.3. Reconfiguration Time for Common Redundancy Protocols

The switchover delay for different communication protocols are reviewed and listed in

Table 3-2. The reconfiguration time following the failure of the primary communication

path varies with different communication protocols. The STP and the RSTP can swtich

to the alternative communication path in a time range from 2 to 20 seconds, which is

acceptable for the data used for SCADA or HMI applications. However, it may not

fulfill the requirements for time-critical substation automation such as protection and

underfrequency load shedding. Communication path redundancy, provided by HSR and


Page | 75

PRP is ideal for all substation automation applications since 0 switch over time is

required.

Table 3-2: Reconfiguration Time for Common Redundancy Protocols

Protocol Description Typical recovery time

STP Spanning Tree Protocol 20 seconds

RSTP Rapid Spanning Tree Protocol 2 seconds

PRP Parrallel Redundancy Protocol 0 seconds

HSR High-availability Seamless Ring 0 seconds

3.3.2. IEC 61850-9-2 based Process Bus Architectures

Similar to the station bus, bay level and process level redundancy has to be considered

to eliminate all possible single point failures in SAS. To evaluate its reliability, the

components connected to the process bus need to be firstly determined. Each bay has its

dedicated IEDs executing the control and protection functions. The process bus collects

the digitalized voltage and current signals form the Merging Unit (MU) and the

instrument transformers (i.e. CT and VT), and then transfers them to the bay level IEDs.

The merging unit is the interface device used to transfer the analogue data from the

instrument transformer into sampled value streams for the substation IEDs. An IEC

61850 compatible IED is normally equipped with an internal clock for time stamping,

providing about 1ms accuracy. External time sources (TS) can be used to provide a

more accurate system-wide time synchronisation in compliance with IEEE 1588 [51].

Ethernet Switches (ESW) are active communication nodes connecting Ethernet

interfaces, which receive, process and forward the Ethernet packets to the specific ports.

Two process bus architectures based on IEC 61850 are considered here, both consider

the redundancy in bay level components as shown in Figure 3-9. Bay Protection Units

(BPU) are normally implemented redundantly (Main1 and Main2) in SAS due to its

critical function in fault detecting. In the Architecture (2) shown in Figure 3-9, the

process bus communication system and the connected devices are implemented

redundantly. Hence, each bay IED has its independent process bus communication

system. All the bay protection and control IEDs are assumed to be doubly attached

nodes (DANP) and therefore can be connected to either a single local area network

(LAN A, without the dashed part) or two independent LANs (LAN A & LAN B). Only


Page | 76

the level of redundancy in the process bus network is considered. Other possible

solutions with different topologies can be found in [52, 53].

Figure 3-9: Two Process Bus Sensor Network Architectures

3.3.3. Reliability Model of the Substation Automation System

There are a number of previous publications which assessed the reliability and

availability of the SAS using fault-tree analysis (FTA), reliability block diagram (RBD)

and tie-set methods [52, 53]. Prior to the application of IEC 61850 in a substation, the

FTA is frequently used to evaluate the reliability of the automation system [54]. The

application of IEC 61850 and other communication protocols brought more

redundancies in the communication routine, therefore, a combination of RBD and cut-

set method is normally applied to quantitatively assess the reliability and take account

of all the redundancy considerations in the reliability model. However, most of the SAS

reliability assessment studies focused on the evaluation of the reliability/availability of

entire SAS communication architecture instead of specific communication service. This

makes it difficult to use the reliability assessment results to evaluate the reliability of the

monitoring, protection and control applications, which are based on certain

communication service.

Consequently, the reliability of different SAS communication services is assessed in

this section. The reliability data is further used for SIPS risk assessment in Chapter 5

and Chapter 6. Reliability assessment method based on analytical reliability block

diagram (RBD) and stochastic Monte Carlo simulation is proposed in this chapter.

Instead of considering all the devices within the SAS, the communication path to

conduct different communication services in the IEC 61850 based digital substations is


Page | 77

studied. The RBD is firstly used to represent the logical connections of the components

needed for each communication service. The reliability assessment method introduced

in the following section is used to estimate the overall reliability of different

communication architectures.

In addition, repair plays a vital role in maintaining the availability of the communication

network. In a repairable system, a fault in an electronic component can be detected by

the self-monitoring system embedded in the IEDs. The faulty component can be either

fixed or replaced in a timely manner. However, not all the failures can be detected in a

timely manner, due to the cause of the fault. Consequently, the reliability indices of the

SAS are examined with and without consideration of the repair.

Since the SAS architecture consists of a combination of series and parallel subsystems,

the fundamental theoretical analysis of basic system structures consisting of two

components (series & parallel systems) is provided using the following analysis

procedure, see Figure 3-10. The reliability (i.e. probability that the system will be

operating during a specified time interval) and the availability (i.e. the probability that

the system is in the available state at a given time) of the substation automation system

in performing different communication services need to be assessed.

1 2

1

2

(a) (b)

Figure 3-10: Basic Two-Component System in (a) Series and (b) Parrallel

1) Non-repairable System Reliability Assessment

For non-repairable systems, the failure rate of the component is assumed to be a

constant value i . This means the repair rate is considered to be zero, which means once

the component enters its failure state, it can never return to the normal state. The failure

of a SAS component is approximated as an exponential distribution with a constant

failure rate ( i ). The probability of the component being in a reliable state during a time

interval t (i.e. ( )iR t ) can be calculated as:

( ) it

iR t e

(3-1)


Page | 78

1i

iMTTF (3-2)

where MTTFi is the mean time to failure of the component i.

Series System: Reliability of a non-repairable system consisting of two components

connecting in series is:

1 2( )t t

sysR t e e

(3-3)

Parallel System: Reliability of a non-repairable system consisting of two components

connecting in parallel is:

1 2( ) 1 (1 ) (1 )t t

sysR t e e

(3-4)

The mean time to failure (MTTF) for both non-repairable systems can be calculated as:

0( )sysMTTF R t dt

(3-5)

2) Repairable System Reliability Assessment

The Markov Model is carried out to represent different operation states in a repairable

system. For a system consisting of two fundamental components, there are four possible

states the system can exist as shown in Figure 3-11. ‘U’ and ‘D’ represent the

component up and down state respectively. The reliability of a series system and a

parallel system is assessed respectively.

Figure 3-11: 4-State Markov Model

Series System: In the case of a series system, State 1 represents the system up state

while all the other three states are the down states. The system failure rate is obtained by

adding all the transition rates from state 1 to the other three states:

1 2

1,2,3

sys i

i

(3-6)

1

sys

MTTF

(3-7)


Page | 79

Parallel System: For a parallel system, State 1, 2 and 3 represent the system up states

while State 4 is the system down state, since the failure of one component does affect

the successful operation of the entire system. The probability of being in a failure state

(fp ) and a success state ( sp ) for a parallel system can be estimated as:

1 2

1,2

if f f

i i i

p p p

(3-8)

1,2

1 1 is f

i i i

p p

(3-9)

The repair rate for the system can then be obtained by adding all the transition rates

departing the failure state (i.e. state 4):

1,2

sys i

i

(3-10)

During steady state, the transition rate to a success state is equivalent to the rate to

failure state. Due to this fact, the concept of equivalent transition rate can be used to

calculate system failure rate of the parallel system. The system transition frequency to

the failure state ff and the frequency to the success state fs can be obtained as:

1,21,2

if s f s i

ii i i

f f p

(3-11)

The failure rate of the parallel system equals:

f

sys

s

f

p (3-12)

Knowing the failure rates of both the series and parallel systems, the MTTF is the

reciprocal of the system failure rate:

1

sys

MTTF

(3-13)

The availability (Asys) of the entire system can be calculated as:

sys

MTTRA

MTTF MTTR

(3-14)

In a complex system with a combination of both series and parallel sub-systems, the

network reduction method can be used to merge the sub-systems and come up with the


Page | 80

reliability indices for the whole system. The reliability analysis, combined with

sensitive analysis, would help indicate the most reliable SCN architecture for different

communication applications. In addition, the most critical component in the

communication network which requires more inspection can be identified through this

evaluation.

An overall knowledge of the SAS physical layout, reliability data and maintenance

strategy is necessary in assessing the reliability of the SAS. Knowing the main

components in the SAS and its hierarchical topologies, eight SAS architectures are

proposed considering all the possible combinations of process bus and station bus

structures:

Arch1: Single star station bus & single process bus

Arch2: Single ring station bus & single process bus

Arch3: Double star station bus & single process bus

Arch4: Double ring station bus & single process bus

Arch5: Single star station bus & duplicated process bus

Arch6: Single ring station bus & duplicated process bus

Arch7: Double star station bus & duplicated process bus

Arch8: Double ring station bus & duplicated process bus

Of these, the first four SAS architectures are not implemented with redundant bay

communication networks; however, these are implemented and deployed in the last four

architectures. Moreover, Arch 7 and Arch 8 are fully redundant from the station bus

down to the process bus, which eliminates all the possible single-point failures in the

SAS.

3.3.4. Reliability Data

To objectively evaluate different SAS architectures, the reliability parameters of the

components need first to be agreed upon. Table 3-3 shows the Mean Time to Failure

(MTTF) and Mean Time to Repair (MTTR) values for all the components used in the

substation. The reliability data are based on the previously published documents and the

IEEE reliability standards [52, 53, 55, 56].

The bay IEDs and the external time sources (TS) are considered as relatively unreliable

devices due to the large number of hardware, software routines and settings contained


Page | 81

by them. In addition, the GPS time reference signals can be easily jammed, blocked or

interfered with. The reliability of the Ethernet switches (SW) depend on the number of

ports it employs. Consequently, the station switch for the ring architecture is more

reliable than the station switch for the star architecture, which requires more Ethernet

interfaces. The reliability and cost figures for the Ethernet Media (EM) also depend on

the geographic distribution (cable length). For the repairable system, it is assumed that

all the faulted devices can be detected and fixed or replaced within 24 hours. The

relative costs of the component are roughly estimated and are used to reflect variation in

the cost of SAS introduced by implementing different levels of redundancy.

Table 3-3: Substation Component Reliability Data

Devices MTTF

(years)

MTTR

(hours)

Relative

Cost

Bay P&C IEDs 100 24 10

MU 300 24 4

TS 100 24 4

IED SW 500 24 3

Bay SW 300 24 4

Station SW(star) 250 24 5

EM (bay level) 800 24 0.1

EM (Middle) 600 24 0.2

EM (Long) 400 24 0.4

3.4. Reliability Assessment of SAS Communication Services

The advent of IEC 61850 standardized the communication services within the

substation and therefore fulfils the interoperability requirements. Three main

applications are covered by the communication services according to IEC 61850:

Communication between a bay IED and a substation level client (HMI, NCC

gateway or substation host) e.g. control, reporting service.

Communication between different bay IEDs e.g. interlocking by Generic Object

Oriented Substation Event (GOOSE) message.

Transmitting digitized data from Merging Unit to IEDs and GOOSE message from

an IED to a Circuit Breaker.

Consequently, to evaluate the reliability of SAS communication architectures, it is

necessary to specify the studied communication services, which serve as the basis of


Page | 82

data transfer in SAS. In this section, the reliability assessment models for different

communication services are built, namely the two-terminal and multi-terminal

communication reliability assessment models. The evaluation is accomplished through

quantitatively assessing the reliability of the communication path using the described

reliability assessment method.

3.4.1. Reliability of Two-terminal Communication

The client-server service can be considered as a two-terminal communication mainly

between a bay unit and a station client (HMI, NCC gateway or substation host). One of

the most important communication services is the reporting function which transfers

information including measurements, targets and switchgear status from a bay unit (i.e.

server) to a station client. Data acquired by the substation level client can be used for

different applications such as energy management and wide area monitoring and control,

etc.

The reliability block diagrams (RBD) are used to describe the logical connections of

components needed to fulfil the reporting service for the eight proposed SCN

architectures. Examples of the communication path for the reporting service from a bay

unit to the substation client for different SCN architectures are shown in Figure 3-12.

TS 1

TS 1

MU EM SW

EM

EM

IED 1

IED 2Bay SW EM Station SW EM

EM

EM

(a) Arch1: Single star station bus & single process bus

TS 1

TS 1

MU EM SW

EM

EM

IED 1

IED 2

EM

EM

Bay SW

Primary Path

Station SW EMSecondary Path

(b) Arch2: Single ring station bus & single process bus

TS 1

TS 1

MU EM SW

EM

EM

IED 1

IED 2

EM

EM

Bay SW EM Station SW EM


(c) Arch3: Double star station bus & single process bus


Page | 83

TS 1

TS 1

MU EM SW

EM

EM

IED 1

IED 2

EM

EM

Bay SW

Primary Path


Bay SW

Primary Path


(d) Arch4: Double ring station bus & single process bus

TS 1

TS 1

MU EM SW EM

EM

IED 1

IED 2MU EM SW

EM

EM


(e) Arch5: Single star station bus & duplicated process bus

TS 1

TS 1

MU EM SW EM

EM

IED 1

IED 2MU EM SW

EM

EM

Bay SW

Primary Path


(f) Arch6: Single ring station bus & duplicated process bus

TS 1

TS 1

MU EM SW EM

EM

IED 1

IED 2MU EM SW

EM

EM



(g) Arch7: Double star station bus & duplicated process bus

TS 1

TS 1

MU EM SW EM

EM

IED 1

IED 2MU EM SW

EM

EM

Bay SW

Primary Path


Bay SW

Primary Path


(h) Arch8: Double ring station bus & duplicated process bus

Figure 3-12: Reliability Block Diagram of different SAS Architectures for Reporting

Service

The components used to fulfil a function are put in series, while the duplicated

communication paths in LANs and process bus are put in parallel due to the seamless

redundancy provided. The Merging Unit of the local bay digitalizes all the current and

voltage samples and transmits them to the bay IEDs through the process bus in a time-

synchronised manner. The trip signals from the protection IEDs can then be further

transferred to the station client through the substation communication network and will

be sent to the National Control Centre (NCC) for various applications. Assuming the

failure of component is exponential distributed, the reliability of the SAS at a mission

time of 104 hours can be estimated using Equations (3-1) to (3-6). The MTTF (both with


Page | 84

and without considering repair) of performing a reporting service in different SAS

architectures is considered. The MTTF (without repair) is the statistical time until the

failure of SAS in performing the communication service without considering

component repair. The MTTF with repair means the statistical time until a second

component failure appears at the same time before the first fault is fixed and the entire

system is declared unavailable. These two reliability indices can then be calculated

using Equation (3-5) and Equation (3-13) respectively.

Table 3-4: Reliability Assessment Results for Reporting Service

Architecture Reliability

(%)

MTTF without

repair (years)

MTTF with

repair (years)

Arch 1 97.70 37.2 53.09

Arch 2 98.04 38.25 63.82

Arch 3 99.20 47.92 151.85

Arch 4 99.21 48.39 151.86

Arch 5 98.41 40.05 81.62

Arch 6 98.76 41.19 110.06

Arch 7 99.92 51.78 321274

Arch 8 99.93 52.31 364632

Figure 3-13: MTTF & Cost of Considered SCN Architectures

As can be observed in Figure 3-13, the SCN architectures with high overall reliability

and high relative cost are located at the top-right corner of the graph, whilst the SCN

architectures with low reliability and low relative cost are located at the bottom-left


Page | 85

corner. It can be concluded that introducing redundancy at both process level and station

level could enhance the reliability of the reporting service in different SAS architectures.

The application of RSTP/HSR ring LAN architecture delivers higher reliability due to

its inherent redundancy in the communication path. In addition, the PRP based double-

star and double-ring LAN station bus could significantly facilitate the transfer of the

monitored data from bay IEDs to the station clients. Nevertheless, the implementation

of duplicated process bus in each bay and the station bus will significantly increase the

cost of the SAS (e.g. Arch4 versus Arch8). In practice, it is necessary to assess the

system reliability with considering the actual cost of the equipment and its maintenance

cost.

A timely component repair could significantly increase the MTTF of the studied

communication service in all the SAS architectures. The enhancement in the

performance is especially obvious for the architectures with fully redundancy from the

station bus down to the process bus (i.e. Arch7 and Arch8). For example, by repairing

the defective component in the SAS, the MTTF increase remarkably from 52.31 to

364632 years. This emphasizes the importance of maintenance testing and self-

monitoring or self-testing functions deployed by the devices in maintaining system

reliability. In addition, this two-terminal communication reliability model can also be

applied to evaluate the reliability of bay switchgear monitoring and controlling, and

communication between two bay level IEDs.

3.4.2. Reliability of Multi-Terminal Communication

The application of Internet Group Management Protocol (IGMP) [57] allows

multicasting data (e.g. GOOSE message) to be filtered and then transferred only to

designated IEDs. Multicast communication plays a vital role in executing distributed

functions such as interlocking, auto-reclosing and breaker failure protection (BFP). This

type of application requires exchanges of time-critical multicast Ethernet frame from a

local bay to the multiple recipients allocated in different bays (usually more than two)

via LAN. Consequently, a multi-terminal communication model is developed to assess

the reliability of the signal path for multicast communication.

The breaker failure protection is taken as an example of substation multicast

communication and its reliability is studied in this section. Given the importance of the

Power System protection and considering the fact that the primary circuit breaker might


Page | 86

fail to operate, the breaker failure protection is often implemented to enhance the

dependability of the protection system. To assess the reliability of the BFP, different

substation arrangements need to be considered as illustrated in Figure 3-14. When a

fault occurs on the transmission line between station B and C, if breaker 3 fails to clear

the fault, BFP fault clearing requires the tripping of different circuit breakers for each

arrangement. Consequently, different consequences on system integrity will be caused.

Figure 3-14: Breaker Failure Protection for Different Station Arrangements: (a) Single

Bus at Station B. (b) Ring Bus at Station B. (c) Breaker-and-a-half at Station B.

For the single bus arrangement as shown in Figure 3-14 (a), a failure of breaker 3

requires the tripping of all the breakers connected to Bus B (e.g. Breaker 2, 5 and 7) to

isolate the fault. This will split the system at bus B. For the ring bus arrangement as

shown in Figure 3-14 (b), the control logic requires the tripping of breaker 3 and 5 to

clear the fault. The misoperation of the breaker 3 requires the BFP function to trip

breaker 2. This leaves the fault connected to line AB and thus requires the tripping of


Page | 87

the breaker 1 by remote backup. Consequently, the transmission line AB will be left

out-of-service due to the BFP. For breaker and a half arrangement as shown in Figure 3-

14 (c), the BFP must trip breaker 2 when breaker 3 fails to operate. This arrangement

allows the system to clear the fault and at the same time keep all the other lines

connected to Bus B remain in service.

To achieve the BFP application, the local bay where the BFP function resides must send

trip signals to other relative bays via LAN to successfully execute the backup protection

function. The impact of BFP on system integrity is affected by the station arrangement.

However, from the secondary SAS point of view, probability of successful execution of

BFP function is only affected by the number of breakers required to be tripped.

Assuming a number of N circuit breakers need to be tripped by BFP to clear the fault,

the multi-terminal communication path for executing the BFP function in a double-star

LAN is shown in Figure 3-15. Redundancy in the station level LANs provides parallel

communication paths for the multicast Ethernet packets from the local bay to the

destined bays via either LAN A or LAN B.

Figure 3-15: Communication Path of Arch7 for Distributed Function

As indicated in Figure 3-14, three relative breakers in Substation B need to be tripped

by the BFP function for the single bus station arrangement. While there is only one

breaker required to be tripped for the other two station arrangements in case of breaker

CB3 failure. The reliability block diagram becomes insufficient in evaluating the

multicasting communication due to the complex multi-terminal communication path.

Therefore, a stochastic Monte Carlo simulation based method is carried out to calculate

the reliability of the studied communication service. The reliability of each component

can be calculated using Equation (3-1) assuming the reliability of the component is

exponential distributed. The reliability at a mission time of 104 hours is studied.


Page | 88

Reliability for peer-to-peer communications with a number of recipients of 1, 2 and 3

were assessed respectively and the results are shown in Table 3-5 and Figure 3-16.

Based on the simulation results, the reliability of executing a distributed function in a

SAS decreases with the increase in the number of recipients. Therefore, the ring and

breaker-and-a-half station arrangements, which require only one back-up breaker to

execute the BFP function, could provide higher reliability as compared with the single

bus station arrangement (3 back-up breakers).

Table 3-5: Reliability Data for Conducting Distributed Functions

SAS Arch Reliability (%)

1 Recipient 2 Recipients 3 Recipients

Arch 1 95.880 93.767 91.756

Arch 2 96.791 95.003 93.254

Arch 3 97.825 96.490 95.113

Arch 4 97.858 96.502 95.170

Arch 5 97.843 96.974 96.152

Arch 6 98.838 98.252 97.682

Arch 7 99.830 99.747 99.641

Arch 8 99.866 99.791 99.708

Figure 3-16: Reliability of SAS to Perform Multi-Terminal Communications

Similar with a two-terminal communication in a substation, implementation of

redundant process bus and station bus significantly enhance the performance of a peer-

to-peer communication. The improvement in reliability is especially obvious when there

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

Arch 1 Arch 2 Arch 3 Arch 4 Arch 5 Arch 6 Arch 7 Arch 8

Re

liab

ility

SCN Architectures

1 Recipient 2 Recipients 3 Recipients


Page | 89

are more recipients in the multi-terminal communication path. Single star LAN

architecture (Arch 1&5) may not be sufficient for the distribution function since its

station switch is a single-point-failure and greatly compromises the reliability of system.

This may also cause additional latency for the time-critical communication due to the

communication traffic at the central station switch. The fully redundant architectures

(Arch 7&8) have the best performance among all the architectures with a reliability of

99.748% and 99.793% respectively. When the communication network of the SAS is

fully redundant, the increase in recipients will result in a lower reduction in system

reliability.

Only two process bus architectures with different levels of redundancy are considered in

this paper. In addition, the breaker IEDs could be directly connected to the bay switches

instead of via process bus. In this case, a better performance in both reliability and

latency can be achieved since the number of components in the communication path is

reduced. However, this requires the breaker IED to have to incorporate the BFP related

logic.

3.4.3. Sensitivity Analysis

Due to the high uncertainty of the data used in the reliability assessment, sensitivity

analysis is carried out to determine the impact of the variation in the assumed data on

the risk evaluation results. In addition, sensitivity study can be used to identify the

weakest and most critical components in the system. This could help improve the

overall performance and allow the allocation of enhanced inspection and maintenance

on the critical component. Two methods are used in the sensitivity study:

Risk Reduction Worth (RRW): the RRW index of a component i is the percentage of

variation in the unreliability by making the examined component perfect (λi=0), whilst

keeping all the failure rate of other components at their original value.

( )( )

( | 0)

sys base

sys base i

RRRW i

R

(3-15)

Wide Range Method: the reliability data of each component is changed over a wide

range to examine the impact of component’s reliability on the overall system reliability.

This method helps identify the component which has the most significant impact on the

reliability of the overall system.


Page | 90

The RRW index of each SAS component in performing the reporting service is

calculated. Table 3-6 shows the RRW indices of components in Arch1 and Arch8,

which are the two SAS architectures with the lowest and the highest reliability

respectively.

Table 3-6: RRW of each component in Arch 1&8

Arch EM SW TS MU IEDs

Arch 1 1.58 1.83 1.01 1.19 1.01

Arch 8 1.39 1.44 1.24 1.25 1.93

A higher RRW index indicates that the component has a greater impact on the overall

system reliability. For Arch1, the highest overall reliability enhancement is achieved by

improving the reliability of the station switch in the Star LAN, which is the single-

point-of-failure in the communication system. The dominating impact of the central

switch on SAS reliability can be effectively reduced by introducing ring LAN

architecture as shown in Arch 8. In that case, the IEDs, which were considered as the

least reliable devices, have the greatest impact on the overall reliability (RRWIED=1.93).

Therefore, it can be concluded that the importance of a device depends on its reliability,

quantity, the location in the system and the overall communication architecture.

Figure 3-17: Impact of MTTF on System Unreliability for Arch 1

Figure 3-17 shows the variation in system reliability when the Wide Range Method is

applied on the assessment results of Arch1. The MTTF index of the SAS components is

changed over a wide range from 0.1 to 5 times of its original value (MTTFbase). In


Page | 91

general, the reliability of the communication service increases as the MTTF of each

component increase. Similar with the RRW index, it can be observed that the reliability

of Arch1 is most sensitive to the Station Ethernet Switch. Increasing the reliability of

the sensitive component leads to the greatest enhancement in system reliability, whilst a

decrease in its reliability will in contrary significantly compromise the overall reliability.

Devices like the Ethernet Media (EM), although being highly reliable, still have a high

impact on system reliability for both architectures due to the large number applied, as

compared with other components.

3.5. Summary

ICT infrastructure plays a vital role in the economical and reliable operation of Power

Systems. It also helps to improve the resilience of Power Systems against stressed

conditions and wide area disturbances. This chapter provides an overview of the

information communication technology used in SIPS application. A detailed description

of the communication architecture from the perspective of the wide-area communication

network down to the IEC 61850 based substation automation system is provided. It is

vitally important to effectively assess the reliability of the communication architecture

to ensure a successful SIPS operation.

The IEC 61850 based substation is a node of SIPS which collects measurements and

implements control actions. A reliability assessment method based on both analytical

and stochastic methods are developed in this chapter to quantitatively assess the

reliability of various communication services in a SAS. Redundancy in the station bus

and process bus communication network is implemented in accordance with specific

protocols.

It is proved that the RSTP based ring station bus architecture is a reliable and cost

effective solution to improve the performance of substation communication. It provides

a significantly enhanced reliability in performing distributed function which requires

multi-casting communication. In contrast, the reliability of the single star LAN

architecture may not be sufficient for the distribution functions since the station switch

is a single point failure and may greatly compromise the reliability of the system and

cause additional latency for the time-critical communication. The implementation of

IEC 62439-3 PRP based duplicated station bus (i.e. double star and double ring) is an


Page | 92

effective solution in fulfilling the availability and performance requirements of the

communications in SAS. Redundancy in the process and bay level components also

significantly improves the reliability of the substation communication. The duplicated

process buses, implemented in accordance to the IEC 61850-9-2 protocol, introduces

additional communication path between bay protection and control IEDs and process

level devices. Since redundancy has always been regarded as an expensive reliability

enhancement method, it should thus be implemented only to the mission-critical

component. In addition, repair has a vital role in maintaining the reliability of the

substation automation system. If the defective component can be fixed or replaced in a

timely manner, the MTTF and availability of the system could be significantly increased.

The sensitivity study could help identify the most critical device in terms of maintaining

the reliability of different substation communication architectures. More maintenance

and inspection effort could then be allocated on these critical devices. The use of the

wide range method and RRW in sensitivity analysis indicates that the impact of a

component on the reliability of the SAS depends on the component’s position in the

system, its reliability and the quantity used.

Page | 93

CHAPTER 4

PROTECTION AND CONTROL ASSET

END-OF-LIFE ANALYSIS

4.1. Introduction

The previous chapters discussed the application of advanced information and

communication technology (ICT) and raised concerns about their impact on reliability

of numerical protection systems. However, the ageing of protection equipment is

another major challenge faced by utilities. For example, protection equipment based on

the IEC 61850 protocol has not been widely applied on the UK transmission network.

UK National Grid has approximately 1,200 circuit bays associated with its main

interconnecting transmission lines. These bays predominantly utilize electronic based

protection equipment (i.e. analogue or early numerical relays) to detect and clear short-

circuit faults. A significant number of these protection devices are now reaching their

design lifetime, and consequently the protection and control systems are expected to

become less reliable, with an increased number of ageing related failures. This could

potentially lead to a degraded system performance, since the aged protection devices

may not be able to provide the effective measurements required for emergency control

during a system disturbance.

Chapter 4: Protection and Control Asset End-of-life Analysis

Page | 94

Due to the critical fault clearance function of protective relays, they must be maintained

in the most reliable state, and must be replaced before they show a pattern of

maloperation, indicating the end of life has already been reached. Consequently, it is

critical for UK National Grid to effectively assess the operational condition of these

relays and predict their expected reliable service life. The current anticipated life of the

protection and control asset and National Grid replacement policy is based on the

manufacturers’ information, on operational experience with prior generations of similar

equipment, and from generic industry observations. This is known as the “manufacture

defined end-of-life”. However, this may not reflect the actual relay end-of-life since it

does not include actual operating and environmental conditions of the equipment.

Therefore, it is critical to evaluate whether the replacement ages specified in the current

policy reflect the actual relay end-of-life and yield the best predicted reliability of

service and use of National Grid resources.

According to the IEEE Power System Relaying and Control Committee (PSRC) [58],

the end of expected life of a protection, control or metering device (i.e. device actual

end-of-life) is determined as a time in its lifecycle when any of the following stages are

reached:

1) The device is not able to perform as per its design specification and it is not

possible to repair.

2) The device has less technical support (parts, spares and expertise) due to product

obsolescence and the cost of repair outweighs the benefits of a newer device.

3) The device is no longer useful and no longer meets present functional requirements.

The end of expected life is determined not only by identified deterioration in condition

or performance, but also by the reduced availability of technical support (parts, spares

and expertise) due to product obsolescence. The useful life is described by IEC as “the

time interval beginning at a given moment in time, and ending when the failure intensity

becomes unacceptable or when the item is considered to be unrepairable as a result of a

fault (IEV 191-19-06).” Consequently, if the device has enough technical support or

sufficient spares are available and is able to meet the required functions, the end-of-life

of the device is when its failure intensity becomes unacceptable.


Page | 95

Reliability and lifetime assessment techniques have historically been based on statistical

failure rate models obtained from historical field failure rates. One commonly used

method to predict the end-of-life is the “bathtub curve” as shown in Figure 4-1, where

component failures were recorded based on their age when failure occurs. The high

infant mortality in the first stage of the bathtub curve is caused by the defects designed

or built into the product. Product failures are those that randomly occurred in its useful

life period. This is the period where the failure rate stays constant and relatively low.

Failures in this stage are not ageing related. However, the failure rate is expected to

increase dramatically once the product exceeds the reliable service life and enters the

end-of-life stage.

Figure 4-1: Bathtub Curve for End-of-life Assessment

The Protection and Control Asset Life Extension (ALE) project carried out by UK

National Grid is introduced in this chapter. The aim of the project is to identify the

critical life-limiting elements within electronic protection devices and to establish

assessment and testing criteria to determine the deterioration mechanisms and rates,

with a goal of determining if “apparently-reliable” electronic relays could sustain good

performance for more years than their previously defined useful life.

4.1.1. Literature Review on End-of-Life Assessment

Lifetime assessment becomes increasingly crucial in the reliable operation of the Power

System. Considerable efforts have been devoted to evaluate the useful lifetime for

Power System components including transformers, cables, breakers, capacitors, reactors,

etc. In addition, it has been emphasized the importance of incorporation of ageing


Page | 96

related failures in system reliability evaluation, which could facilitate the decision

making in areas such as transmission development planning, transmission operation

planning, selection of substation configurations and reliability-centred maintenance [59].

Currently, most of lifetime assessments in Power System have focused on the

components on the primary system. Ageing related failures of the primary equipment

are mainly caused due to design defects or heavy loading. Different from the primary

equipment, a protection relay does not get hot, nor suffer failures related to the number

of faults (except for the input CTs). With an increasingly important role of protection

and control devices in preserve system reliability and more devices approaching their

end-of-life stage, a process to evaluate the operational condition of the devices and to

estimate their reliable service lifetime is required. The existing methodologies used to

determine the end-of-useful life for the protection and control devices are reviewed in

this section:

1) PSRC Asset Health Index for Protection and Control Devices

A method to assess the health condition of the protection and control devices was

proposed by the PSRC working group. An asset health index (AHI) was used to

estimate the protection lifetime. The following factors which may affect the reliable

service time of the devices were considered [58]:

a) F (Manufacture): The factors that could affect the product end-of-life from

manufactures’ perspective include the viability of the manufacture (i.e. the

likelihood of manufacture existing in the future), past performance experience

of the manufacture (e.g. response to issues, quality control, turnaround time for

repairs), the technical support, spares available, upgrade supports from the

manufacture and the performance of the manufacture’s similar products.

b) F (Performance): Historical performance of the device or devices with similar

characteristics in terms of reliability is examined. The Mean Time before

Failure (MTBF), the observed performance during routine testing, the number

of maloperations and unscheduled maintenance and the self-reporting failures

can be used to indicate the condition of the device.


Page | 97

c) F (Utility): Factors from utilities which could impact end-of-useful life include:

the financial policy of the utility, the future direction of the company in terms

of other related areas (e.g. the replacement of RTUs, breakers, control IEDs,

etc.), the staff and resources to support the products and the utility’s operating

requirements to reduce the number of outages.

d) F (Industry): Industry experience, trends in device longevity, anticipated

standards under development and monitor performance will continuously

impact the end-of-useful life of a device. For example, the increase in the

application of Ethernet communication and the advent of IEC 61850

communication protocol may encourage the replacement of conventional

devices before their expected end-of-useful life.

e) F (Device): The non-performance based factors from the device itself may also

have influence on product life time, such as the redundancy of the device in the

system, vulnerable components in the device (e.g. electrolytic capacitor) and

the environment (e.g. temperature, humidity, etc.).

An equation can be developed to integrate all the above factors and quantify the weight

of each factor based on its importance on the actual performance of the device:

1 2 3

4 5

( ) [ ( ) ( ) ( )

( ) ( )]

F derating F Manufacture F Performance F Utility

F Industry F Device

(4-1)

where F(derating) is the end-of-useful life de-rating factor, δ is the overall importance

of end-of-useful life. The other factors (i.e. F(Manufacture), F(Performance), F(Utility),

F(Industry) and F(Device)) are defined as shown in the previous paragraph. An

estimated useful life time would be the designed expected life of the device multiplied

by (1- F(derating)). The equations and parameters used in the AHI based method

provide one way of quantifying the end-of useful life and planning for capital

investment and equipment replacement, but are not scientifically determined. It did not

provide a method to assess the operation conditions of the protection devices and

identify the ageing related degradations within the relay component. In addition, spares

of protection relays may not be important unless it is likely to fail. For example, a

NOKIA phone bought in the 1990’s would probably still work in 2017 (if batteries were


Page | 98

replaced). However, the manufacturer probably stopped making the spares in late

1990’s. Hence except changing batteries nothing is reparable. Consequently, the

identification of potential vulnerable components is vitally important in protection

lifetime assessment.

2) Statistical Analysis

The conventional method for asset lifetime prediction is statistical analysis where

probabilistic distribution functions are formulated based on the historical performance.

The developed probabilistic function can be either parametric or non-parametric [60].

For parametric statistical analysis, the reliability data of the product is fitted into a

proper distribution function such as exponential distribution, normal distribution,

Weibull distribution and etc. The parameters of the distribution function are estimated

based on the data, and the suitability of the model is checked using a goodness-of-fit

test. The non-parametric method is used when no predefined distribution function can

be used to characterise the data. This method has been introduced in [61]. The statistical

analysis method has been used to model both age related repairable failures [62, 63] and

age related end-of-life failures [64, 65].

3) Sample Testing

Another approach to assess the product end-of-life is through sample testing on specific

devices. Samples with different service history can be tested and compared to check

whether there is any degradation in the complete relay or a component function. Both

the system level overall product performance and the component condition can be

checked to determine if a device can continuously function for some additional to-be-

determined period of time. Compared to the uncertainty of the statistical analysis

method, the sample test method identifies the root cause of potential failures and

correlates these with physical wear-out and failure mechanisms, thus delivering a more

meaningful and accurate lifetime prediction. However, a clear understanding of the

operation conditions and degradation mechanism of the vulnerable components is

required.

If a sample with age related failure is available, failure analysis can be performed on the

aged product to determine the root cause of the age related failure. However, it is

important to filter protection maloperations caused by ageing from other types of


Page | 99

failures. A detailed protection performance record could also be useful in failure

analysis.

4.1.2. Asset Life Extension (ALE) Project Test Process

The previously discussed methods have some limitations in predicting the reliable

service life of protection devices. The asset health index (AHI) method requires a lot of

information from different aspects and it is difficult to precisely determine the weight of

each factor on the overall relay performance. In addition, the assessment of asset

lifetime should be undertaken during the useful life period of the equipment, which is

before the significant appearance of ageing related failures. Due to the extreme reliable

performance of protection devices, statistically significant failure data for this specific

population of equipment will not be available before the end-of-life evidence actually

appears. Consequently, a generic asset end-of-life investigative process which consists

of statistical analysis, functional testing and invasive examination is proposed in this

chapter to validate or forecast reliable service life of particular relay type. As shown in

Figure 4-3, the process steps are as follows:

1) Field Performance: The first step is to review the historical records for each

specific relay type to identify any recorded hardware problems. Depending on the

documents available, these records could include relay population, age profile,

maloperation history and causes, failure and repair history, and reports of benchmark

experience from other utilities.

2) Physical Inspection: Next, the conditions of relay samples removed from service

are examined. The disassembled relay modules are inspected visually to check for

cracks, loose or damaged interconnections, heat damage, or signs of corrosion or

contamination. Any components whose industry history, or as-found condition, makes

them targets for further evaluation are listed.

3) Fingerprint Testing: The operational behaviours of the relays are tested to check

whether there is any degradation in functional performances as compared to design

specifications. The operating characteristics are also compared with those of

contemporary replacement relay types to determine if new relays would offer a


Page | 100

meaningful performance improvement that could influence the replacement decision

process.

4) Stress Testing: Test the relay input current transformers (CTs) under simulated in-

service heavy load and fault conditions, looking for thermal stress that could impact life.

In addition, measure the voltage stress on voltage-rated components in the power supply.

5) In-Depth Component Evaluation: The temperature of components within energized

modules is characterized using thermal imaging and non-destructive structural

evaluation techniques designed to identify any potential life-limiting conditions. Hot

components are compared with their rated capabilities, and with electronic product

industry experience related to levels of heating versus reliability impact. Components

requiring further investigation, based on thermal imaging or any other observations, are

examined using three-dimensional x-ray tomographic micro-imaging. These imaging

results show any signs of degradation or wear-out, leading to determination of whether

stressed components are still sound, and whether a specific life extension can be

forecast.

6) FMMEA: Perform failure mode, mechanism and effect analysis (FMMEA) with

regard to the function of each studied component in its relay module and in the overall

operation of the protective relay. The purpose is to identify particular modules or

components most likely to cause a problem with the correct operation of a protection

relay.

7) Conclusions: Analyse the results of all evaluations, to determine if the end of life

replacement requirement dictated in regulatory policy for each relay type can be

extended by five or more years. A life extension recommendation includes any

modification or component replacement action, and a targeted procedure for rechecking

the condition of stressed components after additional years of service.


Page | 101

Field PerformanceReview relay population, age profile,

maloperations and repair history.

Physical InspectionInspect overall product, modules and

construction, and components.

Stress TestingTest components that might have

induced thermal stress or voltage stress.

Fingerprint TestingTest the operational behaviours versus

specifications and newer products.

FMMEADetermine impact of failure prone components on relay operation.

In Depth Component EvaluationUse thermal and 3D X-ray imaging to

check component degradation.

ConclusionsRecommend a specific life extension for a type of relay and actions for components.

Figure 4-2: ALE Project Investigation Process

4.1.3. Benefits and Risks of Asset Life Extension

The premature replacement of relays before they reach their end of reliable service life

may not be an inherently better plan. Apart from the cost of equipment and system

outages, the newly installed relays will go through the infant mortality stage and must

be commissioned and debugged. In addition, modern microprocessor relays types do not

necessarily offer longer service life prospects as compared to the previous generations.

If a scientific investigation process can be developed and the validation of additional

asset life extension can be confirmed, the following benefits could be achieved:

2) Maintain protection reliability without investment in the installation of new relays

and protection schemes.

3) Reduce outages caused by protection replacement and enhance system reliability:

the extension of protection could effectively reduce the frequency of system

interruption caused by protection replacement. With respect to “N-1” reliability

criteria, when an important feeder is being upgraded due to protection replacement,


Page | 102

outage of other transmission lines could lead to severe system conditions and even

cascade tripping.

4) Avoid increase in economic cost of energy due to protection replacement: the

replacement of protection could lead to outage of important generation plant. For

example, when the protection is being upgraded on line to important nuclear station,

the output of the nuclear station needs to be rescheduled to more expensive

generators (e.g. gas generation), which leads to increased energy cost.

5) Avoid infant failures associated with new replacement.

6) Avoid application risks of new products including managing skills and resources.

7) Defer capital investment, resources for other need.

However, the following potential risks might be caused by extending the service life of

the existing relays:

1) Potential increase in end-of-life failures.

2) Create accumulated replacement problems within short windows.

3) Require resources and skills to manage ageing equipment.

These risks can be effectively mitigated by carrying out asset life assessment, setting up

pro-active asset replacement strategy and succession planning and training.

4.2. UK National Grid Asset Life Extension Project

4.2.1. National Grid Protection and Control Asset Life Extension (ALE) Project

National Grid has approximately 1,200 circuit bays associated with its main

interconnecting transmission lines. These bays predominantly utilize electronic

(analogue or numeric) based protection equipment (protection relays) to detect and clear

faults or short circuits. The application of the current replace policy will result in one

third of the protection equipment, in the 1,200 circuit bays, being replaced within the

next 8 years. This will lead to significant circuit outages, equipment purchase,

installation costs, and human resource requirements. However, since most of these

relays are still operating reliably without any sign of degradation, it is critical to

evaluate whether the replacement ages specified in the current policy yield the best

predicted reliability of service and deliver efficient use of National Grid resources.


Page | 103

Therefore, understanding the life-limiting characteristic of these relay types becomes

important for National Grid to optimize its replacement plans. The objective is to ensure

National Grid neither replace these protection devices too early, with unnecessary use of

resources, expenditure and system outages; or neither too late with increased risk of

ageing related failures, maloperations and unmanageable waves of replacement. To

achieve this, a scientific investigation process is required to establish the ageing

mechanisms applicable to the specific protection types. This includes function tests and

invasive examination to determine the deterioration mechanisms.

The National Grid Protection and Control Asset Life Extension project comprises the

following major tasks:

1) Development of detailed scope, schedule, and information gathering process.

2) Develop processes and procedures for asset life extension evaluation.

3) Perform tests and investigations according to Task 2 processes and procedures;

document raw test results from university test laboratories for studied relay type.

4) Investigate the operational behaviour of each studied relay type; identify life

limiting elements; determine whether reliable service life can be extended and issue

action plans.

5) Document processes and procedures for asset life extension evaluation developed

and undertaken by the project team.

4.2.2. Asset Life Extension (ALE) Study of Selected Protection Relays

The objective of the ALE study is to investigate the operational behaviour of three

specific types of protection relays (SHNB, THR and LFCB) and determine if end-of-life

failures have started to occur or might occur in the near future, or if equipment

deterioration is becoming apparent. If the evaluation results indicate no end-of-life

failures or age related deterioration has occurred, then the results can be used to justify

extension of asset life expectation for the considered equipment types.

Table 4-1 shows the current UK National Grid policy on the reliable service lifetime of

different protection generations. Previous work has been established by National Grid to

assess the reliable lifetime of various electromechanical relays. This enabled National

Grid to review the pre-determined reliable service time of the protection devices and

revise its replacement plan [66]. The anticipated lifetime of electromechanical relays


Page | 104

has been successfully extended from 30 years to 40 years. In this project, this research

has been extended to cover more complex electronic relays, especially certain models of

multifunctional analogue solid state protective relays which have served reliably to date

but nonetheless are approaching their presently-rated end-of-life date.

Table 4-1: UK National Grid Policy on Relay Lifetime

Relay

Generation Equipment Family

Anticipated Asset

Lifetime (years)

Replace

window

Electro-

mechanical

Use mechanical force to operate a

relay contact in response to a stimulus. 40 -

Electronic Complex relays using transistorised or

integrated circuits 25 20-35

Numerical Digital (A/D converter and

microprocessor) 20 10-25

These considered equipment ranges from electronic equipment designed in the early

1980’s comprising transistorised circuits, early semiconductors and integrated circuits

to later equipment from the 1990’s that contains numeric components including

microprocessors and analogue to digital converters. A detailed description of the three

studied relay types is introduced as follows:

1) The Alstom SHNB distance relay: The SHNB Micromho static distance protection

relay manufactured by Alstom, GEC Alstom in Stafford (now part of GE). It is

designed to provide high speed phase and earth fault protection for high voltage or

extra high voltage overhead transmission lines. SHNB was designed based on

operational amplifiers, uncommitted logic arrays and was mainly installed from

1985 to 1995. 156 units remain in service with National Grid.

2) The Reyrolle THR distance relay: The type THR is a multi-zone distance relay

manufactured by NEI Reyrolle in Hebburn (now part of Siemens) and was installed

in National Grid substations between 1980 and 1990. THR is based on early-1970s

discrete transistor circuit design. 184 units remain in service with National Grid.

3) The Alstom LFCB differential relay: LFCB is a transmission line current-

differential relaying system based on microprocessor technology with analogue to

digital converters, manufactured by GEC, Alstom, or Areva in Stafford (now GE)

and supplied to National Grid between 1993 and 2005. 213 units remain in service

with National Grid.


Page | 105

The on-site population, installation time and expected lifetime of each relay type are

shown in Table 4-2. The two analogue distance protection relays (i.e. SHNB, THR)

were mainly installed from 1980 to 2000, whilst the early numerical relay LFCB was

first introduced in early 1990. National Grid Policy Statement (Transmission) EPS

12.08, Issue 6, September 2011, Page 12 [67] presents the following range of asset lives

for the relay types studied. The current policy defined reliable life time, based on

manufactures’ information and operational experience, indicates the anticipated lifetime

of the SHNBs and THRs are 25 years, with a replace window between 20 and 35 years.

Whilst, the LFCB relays have a shorter anticipated life time of 20 years with a replace

window from 10 to 25 years.

Table 4-2: Relay Population and Anticipated Lifetime [67]

Relay

Type

Quantity Installed

Anticipated

Lifetime

Replacement

Window

1999 2005 2011 2014 Earliest Latest

SHNB 357 318 229 156 1981-2001 25 20 35

THR 390 369 231 184 1979-2002 25 20 35

LFCB 117 245 228 213 1991-2004 20 10 25

Figure 4-3: UK National Grid Relay Age Distribution (by the end of 2014)

The age profile provided by the National Grid, shows the age distribution of each

studied relay type by the end of 2014, as illustrated in Figure 4-2. If the life extension

cannot be justified, 29.5% of the SHNB (46 units), 48.4% of the THR (89 units) and

30.0% of the LFCB (64 units) need to be replaced in the next five years (by the end of

2019). This will lead to unmanageable waves of replacement, which requires significant


Page | 106

circuit outages, equipment purchase and installation costs and human resource

requirements.

4.2.3. Relay Defect Data Analysis

The National Grid Protection Performance Information (PPI) reports from 2000 to 2013

[68] were reviewed to collect maloperation information for each relay type. A total

number of 30 SHNB, THR or LFCB related maloperations were recorded during the

reporting period. Among them, the cause of 7 maloperations cannot be identified. The

maloperations caused by relay hardware failures were extracted and mainly studied.

Table 4-3: Maloperations for each Relay Type from 2000-2013

Relay Type SHNB THR LFCB Unknown TOTAL

All-types of failures 2 13 8 7 30

Relay Hardware Failures 0 7 2 - 9

Table 4-4: Causes of Relay Maloperations

Relay Type Fault Type Causes No

SHNB Security-based

Misoperation

Application failure: wrong Zone2 setting 1

VT fuse failure (external instrument

transformer problem) 1

THR

Security-based

Misoperation

Power supply failure 5

Card failure: comparator card 1

VT fuse failure or Zone 2 card failure 1

VT fuse failure 1

Primary fault: lightning damages relay 1

Application failure: wrong settings 1

Unknown 2

Dependability-based

Misoperation Unknown 1

LFCB

Security-based

Misoperation

Faulty relay card 1

Comms & Processor cards 1

Unknown 4

Dependability-based

Misoperation Unknown 2


Page | 107

It can be seen that SHNB has the best performance in terms of reliability with only 2

maloperations in 14 years, whilst THR has the worst performance with 13

maloperations. For the two SHNB maloperations, one is caused by Zone 2 setting

failure which is an application failure rather than a relay hardware failure. The other is

caused by VT fuse failure, which is because the VT monitoring function of the SHNB is

manually blocked. Therefore, there are no in-service maloperations recorded in the PPI

attributable to an SHNB hardware failure or defect. Accordingly, no statistical evidence

can be provided to identify any vulnerable components or modules in SHNB relays in

service today.

A total number of 13 THR in-service maloperations were tracked in the National Grid

PPI report and 7 of them were attributable to a hardware failure or defect. 5 out of 7

hardware failures were due to the failure of a capacitor in the power supply module,

which was the most critical unit for the THR life extension. The failure of the power

supply capacitor causes unwanted tripping rather than inability to trip, which is a design

characteristic of the THR. These problematic capacitors have already been replaced in

the power supply module of all the THR units in service, in accordance with the

Equipment Modification Instruction (EMI) 997 replacement procedure and program.

EMI 997 capacitor replacements were performed in the National Grid Light Current

Repair Centre (LCRC) via module rotations, rather than replacing them in the field. The

other two THR failures were caused by random module component failure with no

emerging pattern.

Among eight LFCB in-service maloperations as recorded in the PPI, only two of them

were attributed to relay module failures, and none have occurred recently. Furthermore,

National Grid Transmission Design Circular (TDC) 869 documents an LFCB voltage

regulator integrated circuit failure vulnerability, which has already been addressed by

replacement of all the problem regulators.

4.2.4. Environment Influence

The operating environment is another factor that significantly affects the ageing process

of protection and control devices. An appraisal of the environmental temperatures of

components during operation was performed based on the ambient temperatures

collected over a three year period (Mar 2000 to Mar 2003). Temperature loggers


Page | 108

designed to record hourly temperatures were placed close to batteries at 62 substations

categorised under three regions: Leeds, Birmingham Weather Station and Heathrow

Weather Station. This is summarised below in Error! Reference source not found.5.

The ambient temperatures at the substations at Birmingham area was the highest with

74% of the time above 20°C. However, for 80% of the time throughout a year, the

ambient temperature at Heathrow weather station area stays below 20°C.

Although the data shows the overall maximum and minimum temperatures; from the

data available, it is impossible to assess the diurnal fluctuations in ambient temperatures

for each sampling location. Thus a proper evaluation of this effect on component and

relay reliability cannot be made. Notwithstanding this obstacle, the maximum and

minimum temperatures and temperature variations recorded are adjudged to not

disproportionately affect component reliability and lifetime as these fall well within

acceptable ambient temperature operating limits of the individual components.

Table 4-5: Summary of Ambient Temperatures Recorded over a Period of One Year

Area Smallest fluctuation

(ΔT) / K

Largest fluctuation

(ΔT) / K

Lowest

Temperature

/ °C

Highest

Temperature

/ °C

Leeds 9 (CHTE 275 Battery

Room Amb) 28 (Staythorpe) +4 +36

Birmingham

Weather

Station

3 (Willington 275

Battery Rm Amb)

28 (Enderby Room

ambient; Seabank

outdoor ambient)

+5 +42

Heathrow

Weather

Station Area

4 (Leatherhead

JFHTC Room power

equipt)

27 (BRWE1, Diesel

House Ambient) +1 +30

4.3. Laboratory Evaluation Results on Selected Relays

In this chapter, the previously described Asset Life Extension (ALE) processes are

performed on selected relay types. The laboratory evaluation results are then analysed

and interpreted to evaluate the condition of the protection device and to issue life

extension processes and action plans. The source and commission history of the

evaluated relay samples are shown in Table 4-6. For each studied relay types, samples

with different in service time are removed from the system and used for the laboratory

study.


Page | 109

Table 4-6: Relay Samples used for Laboratory Testing

Type Serial

No.

History

Original Location Commissioned Replaced Service Age

SHNB

101

002838

P

Ex Whitson 275kVCardiff East-

Uskmouth Circuit FPFM

1986

(estimated) 2011 26

SHNB

102

784167

D

Ex Upper boat 275 S.Stn -

Cilfynydd 2 Circuit FPSM 1993 2005 13

THR 97434/

1

Ex Berkswell 275kV S.Stn -

Feckenham Circuit FPSM

1980

(estimated) 2012 33

LFCB

103

208284

J

Ex Creyke Beck 400kV Keadby-

Killingholme circuit FPFM 1998 2006 9

LFCB

103

547373

C

Ex Greystones B 275 S.Stn –

Lackenby 3 Circuit FPFM 1991 2006 16

The evaluation encompassed tests on two different versions of the SHNB MICROMHO

static distance relays (SHNB 101 (silver) and SHNB 102 (black)). As shown in Table 4-

6, a heavily used SHNB101 relay sample with an in-service time of 25 years from 1986

to 2011, and an SHNB 102 relay with a shorter in-service time of 12 years were used

for the characterisation of operational behaviour, thermal imaging and in-depth

component study. The THR relay used for the life extension study was commissioned in

1980 and had an in-service time of approximately 33 years. According to the National

Grid’s current replacement policy on the THR relay, the anticipated life expectancy is

25 years. Therefore, by the operational performance and the component conditions of

the relay are checked to identify if there is any signs of ageing related degradations.

Two LFCB 103 differential relay samples with different in-service time were tested.

The heavily used relay sample has an in-service time of 16 years from 1991 to 2006.

The lightly used LFCB sample has a shorter in-service time of 9 years.

4.3.1. Laboratory Inspection

The conditions of the relay samples removed from service are first examined.

Disassembled modules of each relay type are visually inspected to check for cracks,

loose or damaged interconnections, heat damage, and signs of corrosions or

contamination. An overview of the overall product, modules, construction components

and technologies for each relay type is also provided.

4.3.1.1. SHNB Visual Inspection Results


Page | 110

The SHNB relay consists of 32 modules, which are made up of printed circuit boards

(PCBs) with discrete components including transistors, voltage regulators, logic array

integrated circuits, operational amplifiers, variable resistors, reed relays, diodes, wire-

wound high power resistors, miscellaneous film resistors, and electrolytic and film

capacitors. An example of the SHNB relay and its zone comparator module PCB is

illustrated in Figure 4-4. The modules are retractable, making it convenient for thermal

imaging, during which extender cards were used to draw out the modules during a short

period of operation (from a “cold” start).

Several components were found to differ between the two relays in terms of the

packaging design. The ‘older’ SHNB101 design in particular, contained a number of

obsolete components and in some cases it was not possible to obtain datasheets for

these. However, both SHNB relay types are identical in their construction and have

nearly identical component layouts on equivalent boards. Overall, all the SHNB relays

examined relay had a generally good appearance with few scratches on the exterior. The

PCBs within each module appeared in good condition, the protective conformal coating

on the boards had a good level of sheen with no obvious coating or component

discoloration. No obvious cracks, loose or damaged interconnections have been

identified. There were no obvious signs of heat damage or signs of corrosion even on

the components which experience above-ambient operation temperatures. Thus there

were no obvious targets for further evaluation from the visual inspection.

Figure 4-4: SHNB Relay (left) and Its Comparator Module PCB (right)

The wire-wrap technology as used by SHNBs to interconnect circuit modules, without

the need for soldering wires or fabricating backplanes with connectors, was popular for


Page | 111

the manufacture of electronic equipment in the 1970s and 1980s. It provides more

reliable construction compared to other interconnection methods; the insulation on the

connecting wires is penetrated by the sharp corners of the wrapping posts under

pressure to yield 20 to 40 airtight high-pressure metal contact points in parallel. The

connections are less likely to fail due to vibration or physical stress. Because of the lack

of soldering, solder-related problems are avoided, i.e. corrosion, cold joints and dry

joints that become intermittent. Positive industry experience with wire wrap assembly is

aligned with the observation that National Grid experienced no wire wrap failures in

any of their SHNB relays.

4.3.1.2. THR Visual Inspection Results

The type THR is a multi-zone distance relay manufactured by NEI Reyrolle in Hebburn

(now part of Siemens) and installed in National Grid substations between 1980 and

1990. THR is based on early-1970s discrete transistor circuit design. The visual

inspection on the heavily used THR relay with 33 years in-service time presents a

generally aged and tired appearance (i.e. scratches on the exterior, distorted and

threaded screws and washers, and dust). There is some discoloration of the protective

lacquer on the PCBs which appears uneven in some parts, and may be due to the lacquer

having been applied manually. No obvious cracks, loose or damaged interconnections

have been identified on the boards, or obvious signs of failure in the conformal coating

(lacquer) is evident. There are signs of heat damage on a few resistors.

Figure 4-5: THR Relay (left) and Its Internal PCBs (right)


Page | 112

4.3.1.3. LFCB Visual Inspection Results

LFCB is a transmission line current-differential relaying system based on

microprocessor technologies with analogue to digital converters. These were

manufactured by GEC, Alstom, or Areva in Stafford and supplied to National Grid

between 1993 and 2005. Both LFCB relay samples present a good appearance. The

protective lacquer on the disassembled boards is in good condition. The boards

appeared in good visual condition without any obvious cracks, loose or damaged

interconnections, heat damage, signs of corrosion or whisker growth.

Figure 4-6: LFCB Relay (left) and Its Internal PCBs (right)

4.3.2. Fingerprint Performance Testing

In this section, the operational behaviours of the protective relays are tested to detect

whether there is any degradation in the relay function as compared to design

specifications. The operating time and the reach accuracy of each relay type are

examined. The operating characteristics are then compared with the contemporary

replacement relay types to determine if new relays would offer a meaningful

performance improvement that could influence the replacement decision process.

4.3.2.1. Fingerprint Testing Methodologies

An Omicron CMC 256 test set was used to test the operational performance of the

distance and differential relays. The settings of each relay type and the parameters of the

protected circuits in the 400 kV transmission systems were provided by National Grid

as shown in Appendix A. The methodologies used to test each relay type are described

as follows:

1) Distance Relay (SHNB, THR) Testing Method:

Operational performance of the distance relays are tested via two different approaches,

namely the static and the dynamic fault based testing. For static fault based testing, the


Page | 113

Omicron ‘Distance Relay’ test module is used to simulate different types of fault (i.e.

phase to ground fault, phase to phase fault and three phase fault). All these faults are

automatically simulated by the test set with a constant fault current of 2A and are then

injected into the relays. The relay reach can be measured by inserting test points along

the relay characteristic angle and at edges of each operation zone as shown in Figure 4-7;

the green dots are the inserted test points designed to evaluate the reach accuracy,

sensitivity and operating time of relays with a Mho characteristic. This method is used

by National Grid for their routine tests.

The limitation of static testing is that it uses a fixed injected current and a stable voltage

to test the relay. Consequently, it cannot reflect the actual waveform of the voltage and

current signals seen during a fault. Therefore, a second approach is also used. As shown

in Appendix A, the PSCAD simulator was used to simulate the double circuit

transmission system using the line parameters provided by National Grid. Different

types of fault are inserted at different positions along the transmission line. The

transient fault current and voltage signals at the relay location are recorded and saved in

a COMTRADE file and then replayed by the Omicron test set and injected into the

distance relay to evaluate its performance.

Figure 4-7: Static Fault based Distance Relay Testing in Omicron ‘Distance Relay’

Module


Page | 114

2) Differential Relay (LFCB) Testing Method:

Figure 4-8 illustrates a dual-slope biased restraint characteristic of the LFCB current

differential relay. For a two-ended line with end A and B, IA-a and IB-a are the time

aligned a-phase current vector signals at ends A and B at a particular time. The

differential and bias current values can be calculated as:

( )diff a A a B aI I I (4-2)

- 1/ 2( )bias a A a B aI I I (4-3)

The protection characteristic is determined by four settings: the basic differential current

setting determines the minimum pick-up level of relay IS1, the lower percentage bias

setting k1, the bias current threshold setting IS2 and the higher percentage bias setting k2.

The tripping criteria can be formulated as:

For 2bias SI I , 1 1diff bias SI k I I (4-4)

For 2bias SI I , 2 2 1 2 1( )diff bias S SI k I k k I I (4-5)

Figure 4-8: LFCB Dual Slope Bias Characteristics

The loop back commissioning test is used to test the performance of the differential

protection function of the LFCB relay. The MITZ 03, which is a stand-alone fibre-optic

to electrical communications interface unit, is switched to the ‘X.21 Loopback’ option

to allow loopback of the X.21 communication signals for relay testing.


Page | 115

Figure 4-9: Connections for LFCB Bias Charateristic Testing

As indicated in Figure 4-9, the loop-back connection feature on the LFCB relay allows

the bias characteristic to be detected by injecting a bias current into one phase of the

relay and a differential current into another phase. The relay uses the higher of the two

input currents as the bias current. By slowly increasing the current in the other phase

until the associated phase contact operate, the threshold differential current at this point

can be found. The method is applied to check the dual slope characteristic of each phase

and is sufficient to fully check the functionality of each module in the LFCB. The

percentage bias settings when the current is below IS2 and when it is above IS2 are tested

respectively. The minimum operating current and the minimum operating time of each

phase are also recorded. Detailed settings of the LFCB relay can be found in Appendix

A.

4.3.2.2. Fingerprint Testing and Comparison with Contemporary Replacement

Relays

The described fingerprint testing process is performed on the selected relay samples.

The operating characteristics are then compared with a contemporary replacement relay

(i.e. Alstom P545) to determine if modern relays offer a performance improvement in

terms of operational speed and accuracy that could influence the replacement decision.

1) SHNB Testing Results:

The operational behaviours of the heavily-used SHNB 101 (i.e. 26 years in-service

time), a lightly-used SHNB 102 (i.e. 13 years in-service time) and a modern Alstom

P545 relay were tested using both static and dynamic fault based tests. The SHNB relay

has three comparator modules for three protection zones. Each comparator module

contains 6 comparators for 6 different phase to phase faults (i.e. A-B, A-C and B-C) and

phase to ground faults (i.e. A-E, B-E and C-E). To fully test the relay function, the


Page | 116

protection performances for each protection zone and each fault type need to be

evaluated. The reach accuracy and operation time of Zone 1, Zone 2, Zone 3 and Zone 3

offset are shown in the following Table 4-7 and Table 4-8.

The testing results verify that all the relays are operating as designed under both static

and dynamic fault testing. The reach and operating time of each zone under different

types of faults are accurate. No significant signs of degradation in relay functions can be

identified, even on the relay with an in-service longer than its anticipated lifetime. The

tested relays show similar operational behaviours compared with the contemporary

replacement Alstom P545 relays in terms of reach accuracy and operating speed.

Table 4-7: Fingerprint Testing Results for Static Faults

SHNB 101

(26 years)

SHNB 102

(13 years) Alstom P545

A-E Fault A-E Fault A-E Fault

Zone Reach Op Time Reach Op Time Reach Op Time

Zone 1 80% 15.3 ms 79% 14.6 ms 78% 18.6 ms

Zone 2 150% 517.7 ms 148% 515.2 ms 148% 517 ms

Zone 3 199% 1030 ms 199% 1027 ms 197% 1017 ms

Zone 3 Offset -16% 1024 ms -16% 1023 ms -16% 1015 ms

A-B Fault A-B Fault A-B Fault


Zone 1 81% 12.3 ms 80% 13.1 ms 79% 17.4 ms

Zone 2 153% 510.9 ms 151% 512.3 ms 150% 517 ms

Zone 3 202% 1017 ms 202% 1020 ms 199% 1015 ms

Zone 3 Offset -16% 1023 ms -16% 1020 ms -16% 1021

Table 4-8: Fingerprint Testing Results for Dynamic Faults

SHNB 101

(26 years)

SHNB 102

(13 years) Alstom P545

A-E Fault A-E Fault A-E Fault


Zone 1 82% 13.1ms 81% 14.9 ms 79% 16.7 ms

Zone 2 149% 517.7 ms 148% 520.3 ms 150% 512.3 ms

Zone 3 212% 1024 ms 212% 1021 ms 211% 1016 ms

Zone 3 Offset -12% 1030 ms -10% 1036 ms -15% 1022 ms

A-B Fault A-B Fault A-B Fault


Zone 1 84% 16.7ms 84% 14.9 ms 80% 16.7 ms

Zone 2 154% 515.9 ms 153% 511.3 ms 154% 512.3 ms

Zone 3 205% 1054 ms 204% 1045 ms 205% 1016 ms

Zone 3 Offset no trip - no trip - no trip -


Page | 117

It is worth noting that the testing result for Zone 3 offset reach is significant different

between static and dynamic based testing. With a fixed injected current and a stable

voltage, the static testing is not sufficient to fully test the “polarising” function of the

protection device. The SHNB relay is using the “partial cross polarisation” signal to

provide directional reference for the relay comparators. During a single phase to ground

fault (e.g. A-E fault), the phase of the faulty phase voltage (VA) can be represented by

the sum of the other two healthy phases (i.e. “VB+VC”). For a two-phase or three-phase

close-in fault, the memory voltage signals (i.e. VMA, VMB and VMC) obtained during

healthy live line conditions are used as polarising signals. With an 11-cycle memory

length (220 ms for 50 Hz system), the memory polarising signal is sufficient for Zone 1

protection to clear the close-in faults. However, with Zone 3 time set to 1000 ms, the

Zone 3 offset protection would block the operation when the memory polarising signal

times out. Consequently, during dynamic testing, no trip signal was detected when

testing the relay Zone 3 offset reach.

The testing results indicate that the SHNB relay samples offer equal protection

performance compared with modern relay. It shows that the static fault based testing,

which is normally used by UK National Grid for routine test, may not be sufficient to

fully test the protection function (e.g. Polarising module).

2) THR Testing Results:

The operational performance of a heavily-used THR relay with an in-service time of 33

years is compared with modern numerical Alstom P545 distance relay. The tested relay

show similar operational behaviours compared with the modern numerical relay with no

signs of degradation in operational function. The reach and operating time of each zone

under different types of faults are accurate and as defined. Replacing the THR with

modern equipment is not likely to offer performance improvement.

3) LFCB Testing Results:

The X.21 loopback testing was carried out to test the operational behaviours of two

LFCB relay samples with different in-service time. Based on the test results, both the

heavily-used and the lightly-used LFCB samples are operating as designed. The

percentage bias settings k1 and k2 for each operational phase are proved to be accurate

compared with expectation (i.e. k1=30%, k2=150%). Fast operating times of the LFCBs

ensure that the relay could detect and trip the fault in a timely manner, with the


Page | 118

operational time for all the three phases within 25 ms. No significant signs of

degradation can be identified in terms of the operational performance.

Table 4-9: LFCB 103 (208284J) (9 years in-service time) Testing Results

Calculated K1 Calculated K2 Min Op. Level Min Op. Time

A Phase

Tests

B Ph C Ph B Ph C Ph 0.123 A 22.6 ms

28.7% 29.6% 148.9% 149.4%

B Phase

Tests

A Ph C Ph A Ph C Ph 0.126 A 22.9 ms

29.4% 30.1% 151.1% 149.8%

C Phase

Tests

A Ph B Ph A Ph B Ph 0.121 A 22.6 ms

30.4% 31.5% 150.7% 151.1%

Table 4-10: LFCB 103 (547373C) (16 years in-service time) Testing Results


A Phase

Tests

B Ph C Ph B Ph C Ph 0.116 A 21.3ms

29.4% 29.4% 150.6% 150.0%

B Phase

Tests


31.8% 31.8% 150.0% 150.0%

C Phase

Tests


30.6% 30.6% 150.6% 150.6%

The operational performance of LFCBs is next compared with the modern numerical

Alstom P545 relay in the differential protection characteristic. The results indicated in

Table 4-11 verify that the LFCB relay can provide designed protection function with

similar accuracy compared with the modern numerical differential relay. Additionally,

operational time provided by LFCBs when detecting a fault is similar to the modern

numerical relays. Although the operation speed of LFCB is slightly faster than P545, the

difference is not important as long as timely operation can be provided. Since the

protection performance of the LFCB is identical with its contemporary replacement,

replacing the LFCB with modern equipment is not likely to offer performance

improvement.

Table 4-11: Alstom P545 Differential Characteristic Testing Results


A Phase

Tests

B Ph C Ph B Ph C Ph 0.12 A 25.9 ms

30.1% 30.1% 149.1% 149.8%

B Phase

Tests


30.1% 30.1% 151.2% 150.8%

C Phase

Tests


30.1% 30.4% 149.9% 149.2%


Page | 119

4.3.2.3. Voltage Transformer Supervision Function Testing for Distance Protection

According to the PPI report, one of the two recorded SHNB maloperations was caused

by VT fuse failure. During a VT fuse failure situation, the distance protection would

measure zero voltage for one or more of the three-phase voltages, resulting in erroneous

trips. Consequently, the distance protections must detect the voltage drop caused by a

short circuit or open circuit in the VT circuit to prevent unwanted tripping.

Consequently, the voltage transformer supervision (VTS) function of the SHNB is

tested to check whether there is any degradation in this function which caused the

maloperations.

The relay measures the negative phase sequence (NPS) components of the line voltage

and current signals to detect a voltage failure. During normal system conditions, both

NPS voltage and current levels are below the NPS thresholds. When there is an

unbalanced fault at the primary transmission system, both NPS voltage and current

signals will be above the threshold. When there is a loss of a phase voltage due to a VT

fuse failure, there will be negative sequence voltage, however, the current will be nearly

balanced so there will be no negative sequence current. The SHNB VTS function

therefore operates on detection of negative sequence voltage without negative current to

block the tripping when there is a VT fuse failure.

The voltage waveform during a VT fuse failure was simulated using the Omicron

software by suppressing the voltage of one or more phases to zero at VT fuse failure

time. With the SHNB relay VTS inhibition function enabled, the VTS function

behaviour can be observed as:

When a single phase voltage (or two phase voltages) as measured by the relay

becomes zero, the relay VTS module detects the voltage failure, blocks relay

tripping and provides an alarm.

When there is a three phase simultaneous voltage failure, the VTS module does not

respond, because the fault doesn’t produce a negative sequence voltage. This is not

a practical disadvantage because of the extremely low probability of such a failure

(i.e. three VT fuses blowing simultaneously).

It is therefore reasonable to conclude that the two SHNB distance relays under testing

could detect the loss of voltage situations and block the tripping. The maloperation


Page | 120

caused by VT fuse failure was due to an application problem, the VTS function of the

particular National Grid SHNB relay was disabled.

4.3.3. Stress Testing-Simulated In-service Conditions

The stress that could potentially have an impact on relay life under its normal operating

conditions was also evaluated. The following tests are performed:

1) Input CT thermal stress test

2) Power supply component voltage stress test

3) Auxiliary energizing quantities Stress test

4.3.3.1. Input CT Thermal Stress Test

Any increase in the system current level seen during normal load and fault conditions

might result in continuous thermal stress in input isolating CTs. Consequently, the

maximum fault and load current in the system as well as the relay CT rated overload

capabilities are examined to check whether the thermal stress resulting from these

currents will damage the relay CTs. The maximum fault current on the 275 kV and the

400 kV systems is acquired based on system measurements:

400 kV Network: CT Ratio: 2000:1; Maximum Fault Level: 63 kA;

Maximum Secondary Current: 31.5 A

275 kV Network: CT Ratio: 1200:1; Maximum Fault Level: 40 kA;

Maximum Secondary Current: 33.3 A

The National Grid Technical Guidance Note (TGN) [69], which specifies the ratings of

overload capabilities under heavy load system conditions, is reviewed as shown in

Table 4-12.

Table 4-12: Ratings and Assessed Overload Capabilities of Protective Relays

Protection

Type

Rated

Current (IR)

Max

Continuous

Capability

Initial

Load

Short-Term Overload Capabilities

2 Min 3 Min 5 Min 10 Min 20 Min

SHNB 1 A 3×IR 3 A 6 A 5 A 4 A 3.5 A 3 A

SHNB 5 A 3×IR 15 A 15 A 15 A 15 A 15 A 15 A

THR 1,2 & 5 A 2.2×IR 2.2×IR 3.22×IR 2.6×IR 2.2×IR 2.2×IR 2.2×IR

LFCB 1 A 4×IR 2 A 6 A 5 A 4 A 4 A 4 A

The maximum fault current and overload currents in the system are then compared with

the thermal rating of the relay CTs, as specified in the relay manuals:


Page | 121

SHNB:

AC Input Current Rating: IR=1A

(3×In continuously), (57.7×In for 3s), (100×In for 1s).

THR:

AC Current Input Rating: IR=2A

Maximum Continuous Current: 2.2×nominal rating

Short time Current Rating (2 secs): 50×nominal rating, 25×nominal for maximum

course setting.

LFCB:

AC Input Current Rating: IR=1A

(4×In continuously), (100×In of 400A for 1s).

During a system fault, the maximum fault current on the relay secondary side can reach

33.3A. For all the three relay types, the isolating CTs can be subjected to this maximum

fault current for more than 1 second, which is long enough for a circuit breaker to clear

the fault, even if the fault is cleared in back-up protection operating times. Consequently,

the isolating CTs will not be damaged by the fault current.

During a heavy load condition, with respect to the load encroachment data, the

maximum system loading for protection setting consideration is 5360 A for 400 kV and

4490 A for 275 kV. Consequently, the maximum load current is 2.7A on secondary of

400kV CT and 3.7A on secondary of 275kV CT. According to TGN requirements, the

situation when the load current exceeds three times the rated current should last no more

than 2 minutes, which ensures none of the relay CT will be thermally damaged during a

heavy load condition. Conclusions can be drawn that the forecast load and fault current

won’t cause thermal damage on relay insulating CTs. Therefore, the end of life of the

relay CTs are unlikely to be accelerated by thermal stress.

4.3.3.2. Component Voltage Stress Test

The voltage stress testing is performed to identify any components which might have an

operating voltage approaching its rating limit. Since all the components are designed to

work under their voltage rating, tests are only limited to the components that are


Page | 122

subjected to voltages that are higher than analogue or digital processing circuit supply

voltages, typically the power supply components. A voltage meter is used to measure

the voltage across the components when the relay is powered up. Table 4-13 shows the

voltage stress testing results on the THR PS10 power supply unit.

Based on the test results on all the tested components of the three relay types, the

voltage subjected to the component is well within the component voltage rating. This

indicates that components with high quality standard were used during the design of the

relay. No component has voltage stress exceeding or approaching its rating.

Consequently, component degradation is not likely to be accelerated by voltage stress.

Table 4-13: THR PS10 Power Supply Unit Components and Voltage Stress

Code Product Code/description

Voltage

Rating (V)

Operating

Voltage (V)

Output

Regulator

Board

C5 Electrolytic capacitor 63V 14.63

C6 Electrolytic capacitor 63V 14.71

D9 Small signal diode

0

R2 W22 R33 Vitreous enameled

wire wound resistor 84V 0.27

R8 Axial lead polymer film resistor, 3.3kΩ 150V 0.016

R12 Axial lead polymer film resistor, 470Ω 150V 0.084



R21 Axial lead polymer film resistor, 1kΩ 150V 0.03

Power Supply

(EMI:997,

Date:

16/12/2011

LCRC R6P

B/C:00013006)

T5 2N2222A 307

TO-39 type Si

Planar Epitaxial

NPN high speed

switch, metal can

package

Collector-

base max

voltage: 75V

29.5

R21 Axial lead polymer film resistor, 4.7kΩ 150V 20.13

R25 Axial lead polymer film resistor, 3.3 kΩ 150V 21.7

4.3.4. In-Depth Evaluation of Modules and Components

In this section, the components on energized relay modules are characterized using

thermal imaging and non-destructive structural evaluation techniques, designed to

identify any potential life-limiting conditions. The identified hot components are

compared with rated capabilities and with electronic product industry experience related

to the reliability experience with different levels of heating. Components requiring

further investigation based on thermal imaging or any other observations are examined

using three-dimensional x-ray tomographic micro-imaging. These imaging results show


Page | 123

any signs of degradation or wear-out, leading to determination of whether stressed

components are still sound, and whether a specific life extension can be forecast. A

system level failure mode, mechanism and effect analysis (FMMEA) is next performed

to determine the function of each studied component in its relay module and in the

overall operation of the protective relay. The purpose is to identify particular modules or

components most likely to cause a severe problem in the fault clearance operation of the

relay.

4.3.4.1. Thermal Characterisation

A Fluke Ti100 9Hz thermal imaging camera was used to perform the thermal

characterisation for all the three relay types on a module by module basis. This

facilitates the identification of components whose temperatures rose above the ambient.

A detailed module-by-module analysis for each relay type can be found in the

individual testing reports [70-72]. The in-depth component evaluation of the LFCB

relay was illustrated in this section.

Figure 4-10 and Figure 4-11 show an LFCB disassembled for thermal imaging.

Components that were identified to operate above the ambient temperature and

components close to the hot spots are considered to be most vulnerable to the

degradation mechanisms driven by temperature, such as creep and fatigue of device

interconnections within integrated circuits. A brief conclusion of the thermal imaging

observations on each module of the LFCB is described as follows:

Module 1: Power Supply (GM0026013A): A number of elevated temperature zones

(hotspots) were identified in the power supply module during operation (see Figure 4-

10). These include a diode (D34), a voltage regulator (IC47) and three resistors.


Page | 124

Figure 4-10: Thermal Images and Components within LFCB Power Supply Module

Modules 2 &3: Relay Outputs 1 & 2 (GM0032001A): Modules appear identical, and in

both, a single hotspot was identified as a film resistor (R54), which reached about 31°C,

and is shown in Figure 4-11.

Figure 4-11: Thermal Images on Components within Modules 2&3 (Relay Outputs 1&2)

Module 4: Communications controller (GM0052021): This module consists of two

PCBs: a communications controller board and a communications interface board. A

number of components reached above ambient temperatures during operation on both


Page | 125

boards. These are three voltage regulators (IC23, IC24, IC25), two logic ICs (IC34,

IC1), and a variable resistor (RV1).

Module 5: Microcomputer module (GM024001AZ): This module consists of a single

PCB, and exhibited three components operating at above-ambient temperatures. These

were two voltage regulators (IC28 and IC29) and a logic IC (IC1) reaching 36°C and

31°C respectively.

Module 6 Analogue & status input module (GM0036001A): In Module 6, two

components exhibited above ambient operation. These were a voltage regulator (IC11)

and a PDIP IC, reaching 31°C and 33°C respectively after a few minutes of operation.

Module 7 Current Transformer input module: This module was not imaged. It was

determined in stress testing in Section 4.4.3 that the current transformers are not close to

their rated capacity due to fault currents in the National Grid transmission system.

A list of imaged modules and their components with maximum observed temperature is

shown in Table 4-14. These identified components are operating at a temperature above

that of the surrounding component. However, at less than 36 degrees C, the components

are not thermally stressed or operate at a significant fraction of their power dissipation

capability or operating temperature limit.

Table 4-14: Thermal Imaging of LFCB Relay and Examined Hot Components

Module ID Module/Function Hotspots Component(s)

1 GM0026013A Power supply Yes

Voltage regulator

(IC47, IC9), resistors

x3, diode D34

2 GM0033001A Relay output 1 Yes (31°C) Resistor R54

3 GM0033001A Relay output 2 Yes (31°C) Resistor R54

4 GM0052021A Communications controller Yes (32°C)

Voltage regulator

(IC23, IC24, IC25), IC

34, IC1, resistor RV1;

IC14, R27, R28

(interface board)

5 GM024001AZ Microcomputer module Yes (36°C) IC28, IC29, IC1

6 GM0036001A Analogue & status input module Yes (33°C) IC11

7 Current transformer input

module Not imaged

4.3.4.2. Detailed Structural Investigation via 3D X-ray Microtomography

Based on the findings via thermal imaging, a number of components were identified as

potentially vulnerable to thermal degradation mechanisms. A selection of these


Page | 126

components was therefore subjected to a detailed structural investigation using 3D X-

ray microtomography, which greatly facilities the non-destructive observation of the

internal structure of engineering materials and structures [73]. This technology enables

cracks and defects to be observed three-dimensionally without destroying the specimen

or compromising the results. X-ray images or projections of a sample are acquired from

a rotating specimen by a stationary detector. These images are reconstructed into a

three-dimensional volume using computer software. Multiple ‘virtual’ cross-sections (or

slices) can be obtained in any plane of interest. The aim of the testing is therefore to

ascertain any existing structural damage or degradation within the components and the

levels of damage thereof. The tomography imaging was performed on an Xradia Zeiss

Versa-XRM500 CT system. A list of identified vulnerable components which require

in-depth component investigation for each relay type is given in Appendix B.

The X-ray tomography study on the LFCB voltage regulator is provided in this section

for illustration purpose. As shown in Figure 4-12, the voltage regulators on the

communication interface board (Module 4) from LFCB relays with 8 and 15 year

service histories were studied and compared.

(a)


Page | 127

(b)

Figure 4-12: X-Ray Tomography Images of LFCB Voltage Regulator IC14, Module 4:

(a): 8-year old relay; (b): 15-year old relay;

Of the regulators studied, the void area in the die attachment ranged from 1.08% to

9.08%, with no pattern of lifting or separation from the substrate. Wire bonding is sound.

None of the observed imperfections impact the reliability of the regulator in this service.

The origin of the voids cannot be determined – they may have been present when the

regulator was new.

Figure 4-13: X-Ray Tomography Images of Voltage Regulator IC23, 15-year Old Relay

The studies noted that the chip with 9.08% void area came from a relay with 8 years of

service, while Figure 4-13 shows a similar regulator with no visible voids from a relay


Page | 128

with 15 years of service history. There is no evidence that ageing in service is affecting

the reliability of these regulators. By contrast, Figure 4-14 shows an image of a

regulator from a different piece of equipment (not a protection relay) with bonding

separation that risks failure. Note that this image is photographically the negative of the

images in Figure 4-14 – the dark area is sound and the light area shows separation.

Figure 4-14: Acoustic Microscopy Images showing the Evolution of Degradation in a

TO-220 Package Die Attachment during Thermal Cycling

The detailed structural investigation was undertaken on a number of components in each

relay type as listed in Appendix B. The components are considered more susceptible to

thermally activated degradation mechanisms. Samples of these components, mainly

transistor/IC packages, were extracted from relays with different service life history.

Particular attention was paid to die attachments and wire bonds. Signs of packaging-

related damage, i.e. die attachment voiding and cracking were observed. It is not

possible to say whether the observed damage was present in the as-manufactured

condition, or whether it evolved during operation. Overall, the damage observed in

components was not extensive. Percentage void area beneath die attachments ranged

was always less than 9.08%. Thus, although a gradual degradation in thermal resistance

and electrical performance is expected over time, under the typically benign ambient

environmental conditions and in the absence of significant temperature cycling,

significant acceleration of the observed degradation mechanisms is unlikely. In addition,

no signs of bond wire failure were observed. The detailed funding and experimental

tests for each relay components can be found in the individual testing report [70-72].


Page | 129

4.3.5. System Level Failure Mode, Mechanism and Effect Analysis

If vulnerable components, or components with ageing related degradation, can be

identified, a detailed FMMEA can be performed to determine the impact of the

component on the overall relay operation. A relay spends the majority of its life in an

energised but quiescent state, where it is monitoring a healthy but live transmission line.

The two most common failure modes of Power System protective relays are:

Failure to trip when required.

Mal-trip when not required.

Other types of failure mode may not affect the clearance of the fault but could affect

other applications:

Correct trip, but incorrect operation of other outputs (e.g. indicator lamps, auto-

reclosing signal).

System level FMMEA was performed to indicate all the relay operational failure modes

that can be caused by each relay module. Functionalities of each printed circuit board

are described for the modules containing multiple PCBs. A detailed description can be

found in the individual report for each relay. This helps National Grid identify the

impact of component failure on relay behaviour and evaluate the risks of extending the

lifetime of the relay. It can be concluded from the FMMEA that no component is likely

to cause an operating problem are also ranked as likely to fail.

4.4. Conclusions and Future Works

Each of the three relay types yielded consistent evaluation results and demonstrated

eligibility for an asset life extension. Based on condition and deterioration observed on

the potential vulnerable components on the relay sample approaching designed lifetime,

an initial extension of five years for each relay type was proposed. The conditions of the

vulnerable components are recommended to be tested again after the five-year extension.

This decision is made since no significant sings of degradation is observed. However, if

the relay component shows significant signs of ageing related degradation, accelerated

lifetime testing (ALT) is recommended to be performed to determine the exact lifetime

can be extended. There is future opportunity for further extension by focused


Page | 130

rechecking of the most stressed components as documented in the individual detailed

test reports.

Since the tested relay types continue to perform reliably with no increase in failure rates

or component degradation over many years of service, the flat failure-rate trajectory

does not forecast any specific end of asset life. The proposal to extend asset life by five

years comprises a service life extension of only 15% of the time for which the oldest

evaluated unit has already served. The service life extension is further supported by

thorough technical evaluation of any failure that occurs during the extended life interval,

and re-evaluation of the policy change if any unforeseen failure pattern arises.

4.4.1. Recommendations

Table 4-1 indicated the range of asset lives for the relay types studied as defined by the

National Grid Policy Statement EPS 12.08, Issue 6. Since all known early-failure

vulnerabilities have been corrected in relays in service today, the important limits are

the anticipated asset life and the latest onset of significant unreliability. Based on the

evaluation results, the following recommendations were given:

1) Based on results of project evaluation, this report recommends the following 5-year

extensions for all of the three relay types as shown in Table 4-15:

Table 4-15: Recommended Relay Lifetime based on Evaluation Results

Relay

type Equipment type

Anticipated

asset life

(recommend

ed)

Earliest onset of

significant unreliability

(recommended)

Latest onset of

significant

unreliability

(recommended)

SHNB

and THR

Complex electronic

relays (transistorised

or integrated circuits)

30 years 25 years 40 years

LFCB Digital (A/D converter

and microprocessor) 25 years 15 years 30 years

2) For any targeted relay that fails in service during the next five years, a project team

established by National Grid Protection Engineering shall convene to investigate

the failure and report results, including any impact on replacement life policy, and

on conclusions of this set of reports.

3) For each of the three relay types, a National Grid Technical Document should be

developed specifying the evaluation process to be carried out if units in service are


Page | 131

still performing reliably in 2020. The following steps for one to two aged samples

of a relay type should be included to evaluate any changes from the condition

observed in the present study:

a) Physical inspection.

b) X-ray tomography of specifically targeted ageing components.

c) Fingerprint testing to the baseline established in the respective full test report.

d) See Section 8.10.6 of SHNB Report [70], Section 8.7.6 of THR report [71], or

Section 9.9.7 of LFCB Report [72] for recommendations.

In particular, the X-ray tomography of stressed components already identified

in the present study, performed on one to two relay samples, will show any

new ageing evidence that was not observed in the present work.

4) Complete repetition of the large study documented in full reports [70-72] is not

required on retest.

5) Document the review conclusion for each relay type:

a) Relays are reaching end of service life; revised policy above is appropriate.

b) Relays remain reliable, without impending failures. Propose to further update

replacement policy to extend asset life by an additional five years, with another

recheck in 2025.

6) Consider energizing relay spares periodically to restore the dielectric layer within

electrolytic capacitors.

a) Energize at least annually, and for at least one hour. Observe apparent

operational state via relay panel indications.

b) Assess adequacy of spares inventory for each relay type whose asset life has

been extended.

c) This procedure is valid for all relay types, but is particularly valuable for

ageing relay spares of the types for which this report recommends extension of

service life.

The rated life values tabulated above have all been increased by 5 years for consistency

with the existing policy rating process. It is worth noting here that the earliest-onset life

ratings could be revised to reflect the reliable service history for the significant

populations of each type currently in service. However, this will have no practical

impact on replacement plans, which is driven by anticipated and latest-onset life ratings.


Page | 132

4.4.2. Further Work and Application to other Equipment Types

The application of the evaluation process to validate asset life extension decision for the

selected relay types has been described in this chapter. In addition, this process is also

effective for use on electronic systems built from components similar to those of SHNB,

THR, and LFCB, which incorporates electronic technologies ranging from transistorised

circuits to microprocessor-based systems with other large-scale integrated circuits and

power semiconductors.

If certain relays (or other critical equipment types) are serving reliably as they approach

policy end-of life, they are potential candidates for this evaluation and for possible asset

life extension. With experience gained in the present study, it is possible to conduct

these studies in the most efficient manner. A history of reliable service for a large

population of products can be used as a good pre-filter for choosing products to study.

This report recommends that results for the studied relays must not be broadly applied

to other similar devices, unless the electronic design employs the same components or

hardware platform. To the extent that another product shares some of the same design,

only the elements that are different (and the overall fingerprint test performance) need

be evaluated. Otherwise, a full study should be carried out. It is also possible that

product study may reveal unforeseen risks by showing design weaknesses, degradation,

or impending failures that could not be observed by visual inspection or by normal

functional behaviour – hence such products may not deliver anticipated asset life. While

this is not the result that an asset life extension study would hope to establish, it is

valuable to know.

4.5. Summary

The protection and control asset end-of-life analysis carried out in this chapter

effectively evaluated the operational condition of some of the commonly used electronic

and early numerical protective relay types in the National Grid 400 kV and 275 kV

transmission networks. The protection maloperation record indicates that all the selected

relay types are serving in a highly reliable manner, with very few maloperations

attributed to relay hardware failures. Accordingly, all the devices are still in their useful

life time with random failures, and no statistical evidence of vulnerable components or


Page | 133

modules could be identified. An evaluation process combining statistical analysis and

sample testing is developed to identify the life limiting factors and used to estimate the

reliable service lifetime for each relay type.

Operational behaviour of the studied relays was tested and compared with modern

numerical relays by performing static and dynamic fault based fingerprint testing. No

degradation in protection function can be identified, and all the studied relays offer

equal performance in operational speed and accuracy for their intended functions as

compared to modern relay types. Stress testing was performed to identify components

which operate under voltage, thermal or current stress during relay normal operation

state. It has been proved that the increase in system fault current level and heavy

overload current level won’t cause additional thermal stress for the selected relay input

CTs. Components that are identified to operate above the ambient temperature are

considered to be most vulnerable to the degradation mechanism driven by temperature,

such as creep and fatigue of device interconnections within integrated circuits. Samples

of these components, mainly transistor/IC packages, were extracted from relays with

different service life history for 3D X-ray microtomography in-depth study. The non-

destructive structural evaluation techniques indicate that no signs of degradation or

wear-out can be found on the studied relay types. Each of the three relay types yielded

consistent evaluation results and has demonstrated eligibility for an asset life extension.

Based on condition and deterioration observed to date an initial extension of five years

for each relay type is proposed. In addition, effective maintenance and condition

monitoring strategies are recommended to National Grid to ensure effective

maintenance on these protection assets and provide timely updates if there is any new

ageing evidence.

The critical function of the protection devices in initiating the protection scheme and

preserving system reliability has been discussed in the previous chapters. The

evaluation process proposed provides an effective method for utilities to assess the

condition of the protection devices to ensure their reliable operation and an optimal

replacement plan.

Page | 134

CHAPTER 5

RISK ASSESSMENT OF A SYSTEM

INTEGRITY PROTECTION SCHEME

5.1. Literature Review of SIPS Reliability Assessment Method

As described in the previous chapter, the increased application of SIPS and high

financial penalty of SIPS maloperation make it necessary to assess the SIPS reliability

on a regular basis. In addition, the introduction of modern ICT devices and IEDs, and

the application of new communication protocols in SIPS design make its reliability and

performance a great concern. All these changes in SIPS call for a method which

effectively assess the possible risks induced by SIPS operation. The previously

developed SIPS reliability criteria were reviewed in Chapter 2. These could be used as

reliability standards to evaluate the performance of SIPS and various reliability

enhancement methods. In this section, the risk assessment methods proposed in the

previous literatures are reviewed and their effectiveness in the modern electrical

network is analysed.

A generic procedure for risk-based assessment was firstly developed by Fu et al. in [74]

to determine the arming strategy of a SIPS. The optimal arming point was determined

by comparing the operational risk of the test system with and without SIPS. The

Chapter 5: Risk Assessment of a System Integrity Protection Scheme

Page | 135

reliability assessment method is based on a combination of Markov Modelling and

failure mode and effect analysis (FMEA). It is argued that Markov Modelling is well

suited for SIPS reliability assessment because of its flexibility to account for various

common features and operational states in SIPS. Specifically, it could incorporate

independent and common cause failures, partial and full repairs, maintenance and

diagnostic coverage. The reliability assessment method described in reference [74] is

applied and developed by Esmaili [75] to assess the risk of SIPS based on the online

measurements and determine the optimal data window for SIPS performance prediction.

The optimal starting time and width of the data window used to predict its performance

is evaluated to provide the best in time actions to keep the stability of a continuously

changing Power System.

Besides Markov Modelling, the reliability block diagram (RBD) and fault tree analysis

(FTA) are often adopted in industries to quantify the risk of SIPS [25, 76]. Reference

[25] quantified the probability of SIPS dependability and security based maloperations

using FTA, which was then applied to the Dinorwig intertrip scheme in North Wales.

Two reliability indices (i.e. SIL and STL) were enforced as reliability criteria to

quantify the level of SIPS maloperations. The assessment results shown FTA is a

flexible method to model complex protection scheme structures and handle data

uncertainties. Human operator errors and software problems preventing the scheme

being armed when required are considered in the risk evaluation.

The risk assessment models proposed by Miguel [77, 78] considered the impact of

uncertainty in the protection system, communication system and the correlation

coefficients between generation and demand by generating random samples using

normal distribution. Trade-off between capability costs, operational benefits, and the

risks associated with SIPS operation is focused. The research in [79] used con-resistant

trust to quickly identify the maloperation of the protection scheme to mitigate system

instability. The con-resistant trust mechanism allows SIPS to assess the cooperative and

defective behaviours of different load points based on the periodical report. This helps

the decision making in the load shedding to ensure the stability in system frequency. It

proved that a load shedding protection scheme with the con-resistant trust mechanism

was able to keep the steady-state frequency above the threshold with a high degree of

uncertainty in a Power System. In this assessment, historical data are used to predict the


Page | 136

SIPS operation risk in the future. This significantly improves the accuracy and

effectiveness of the risk assessment model.

It has been proved that the reliability block diagrams, Markov Modelling and Fault tree

analysis are suitable methods to get an overview of SIPS reliability. However, the

deployment of modern ICT significantly increases the complexity in SIPS structure.

The impact of the digital communication system on SIPS needs to be reflected in the

reliability assessment procedures. Consequently, a detailed study of the ICT used in

SIPS and its communication architectures are provided in this chapter. The proposed

method also effectively analysed the possible failure modes coexisting in each

component and SIPS. In addition, the continuous changing system conditions due to the

integration of renewable energies and demand side management make it difficult to

analytically assess the consequence of SIPS maloperations. The Monte Carlo

Simulation is more appropriate and accurate in evaluating SIPS risk under various

system conditions [80], and has not be applied to SIPS risk assessment. Consequently, a

method based on Sequential Monte Carlo simulation is provided in Chapter 6 to capture

the variation in SIPS risk with varying system condition.

Literatures in this area provide not only reliability assessment methods for SIPS

evaluation, but also components’ reliability data and cost data for various SIPS

operation. The effectiveness of the data greatly affects the usefulness and accuracy of

the risks assessment results. Therefore, to objectively evaluate the scheme risks, the

reliability and cost data need first to be agreed upon. These data can be obtained from

three main sources: 1) actual data can be acquired from a systematic measurement and

collection process; 2) data used in the previous publications such as databases or

handbooks; 3) data suggested by the experts and experienced engineers. Among them,

the actual data can best reflect the scheme performance. If assumed data are used, then

by performing sensitivity analysis, the impact of variation in the reliability data on

overall scheme risk calculations can be observed. It can also help to identify the most

critical components or the operational phase of a protection scheme.

5.2. SIPS Risk Assessment Procedures

Based on the previous method and a thorough study of existing Power Systems and

SIPS infrastructure, a SIPS risk assessment procedure is proposed in this section to


Page | 137

evaluate the risk introduced by SIPS operation and possible maloperations. Normally,

three basic steps are involved in the risk assessment of SIPS:

1) Reliability Assessment: the reliability assessment includes a thorough study on the

SIPS infrastructures and logics and identifies the possible failure modes. The

probability of each SIPS failure mode is then estimated using reliability assessment

method.

2) Impact Assessment: the consequences of different SIPS operating states under

various system conditions are estimated in terms of financial losses, which reflect the

severity of the impact on the overall Power System. The consequences of a SIPS

maloperation vary with the remedial control actions deployed by the SIPS, its failure

modes and the Power System conditions at the incident.

3) Risk Assessment: system risk is expressed as the product of each state probabilities

and its corresponding undesirable financial impact.

5.2.1. Reliability Assessment

The applications of some typical reliability assessment methods are first discussed in

this section. Figure 5-1 shows a typical procedure for SIPS reliability assessment.

Identification of Relative

SIPS Basic Components

FMEA on SIPS Component

Markov ModellingEstimate Probabilities of each

Component Operational State

Reliability Block DiagramCombine Individual Component’s

Operational State

Compare Results with

Reliability Requirements

Sensitivity Study

Wide Range Method & Risk

Reduction Worth

Figure 5-1: SIPS Reliability Assessment Procedures


Page | 138

The first step is to identify all the relative SIPS components. This requires a thorough

study on the physical layout, operating logic and purpose of the investigated scheme.

Next, the possible failure modes of each basic component and their impact on SIPS

performance are examined via Failure Mode and Effect Analysis (FMEA). The

reliability assessment procedure is carried out based on a combination of Markov Model

and reliability block diagram (RBD) and is used to quantitatively assess the

probabilities of each SIPS operational state. Once the probabilities of being in DBM

state and SBM state are identified (i.e. Pr(DBM) and Pr(SBM)), the reliability index

can be compared with the standards (e.g. SIL and STL) to determine whether the

scheme meets the reliability requirement. Next, sensitivity studies are applied to

investigate the impact of variation in the reliability data on the assessment results. The

importance of each SIPS component on the reliability performance can also be

identified.

The previously discussed reliability assessment methods which were applied to the SIPS

reliability assessment procedures are discussed here:

Failure Mode and Effect Analysis

Failure Mode and Effect Analysis (FMEA) is a systematic method designed to identify

the failure mode of a system in a “bottom-up” way. The entire protection system can be

hierarchically divided into several subsystems and modules and then analysed one

component at a time. SIPS components, which were identified through the first step in

reliability assessment, are considered as the basic components in the FMEA. A

component-level FMEA is first carried out to determine the possible failure modes of

each SIPS component based on its function and failure mechanism. The impact of

component failures on the performance of SIPS is next determined using a system level

FMEA.

Markov Modelling

Once the failure modes have been determined, the state probability of the components

and the frequency of entering each state at a given time need to be determined. Markov

Modelling is carried out to involve all the mutually exclusive states that a SIPS

component can exist in and to reflect the random behaviour of component state that

varies with time or space. The system transition from one state to another is driven by

either a system failure or a system repair. Once the failure and repair rates of each


Page | 139

component are known, the state probabilities of the components being in each

operational mode identified by the FMEA at a specific time in the future can be

calculated. In addition, the failure and repair actions can be effectively reflected in the

reliability study.

Reliability Block Diagram

Impact of each SIPS individual component’s operational state on fulfilling a certain

SIPS function is determined using the Reliability Block Diagram (RBD). The RBD (or

Network Modelling), which is a success-oriented network describing the function of the

system, is built to describe the logical connections of components needed to fulfil a

specific operation in SIPS application. SIPS components are represented as a number of

functional boxes interconnected. The resulting network is composed of components in

series, in parallel, or in combination configurations depending on the function needed.

A successful operational function can be viewed as a success path from left to right of

the RBD. Mathematical methods can be applied in combination with the RBD to

quantitatively evaluate the success and failure probabilities, e.g. Tie Set Method, Cut

Set Method, Conditional Probability Approach, Event Tree, etc.

5.2.2. Impact Assessment

The consequences following a SIPS maloperation may be significantly different and

vary with the system condition at the time of failure. Therefore, the impact of each

studied SIPS operational states is estimated in terms of financial losses under a wide

range of system conditions. Impact of SIPS maloperation includes financial losses

associated with equipment outage, generation curtailment, energy redispatch and load

shedding.

For each SIPS, the consequences of the three SIPS states need to be assessed: successful

SIPS operation, SIPS DBM and SIPS SBM. In addition, as described in the future UK

energy scenarios [81], significant changes in the UK energy composition is expected to

take place in the next decades. The deployment of large-scale wind energy and the

growth in the weather-dependent distributed generation calls for a precise model which

can reflect the variation and uncertainty in both generation and load demand and can

increase the degree of accuracy in the SIPS impact assessment. The method is further

illustrated in the numerical studies.


Page | 140

5.2.3. Risk Assessment

Knowing the probability and impact of each SIPS operational state, the risk from each

SIPS operation is defined as the probability of the state weighted by its corresponding

financial impact. Both analytical method and stochastic method can be applied for risk

calculation:

1) Analytical Risk Assessment: The analytical risk assessment is more suitable for

analysing an event-based SIPS with predefined protection strategies. It is

normally applied to a simple system with limited variation in load and

generation. The risk from each SIPS operational state is acquired as the

probability of the initiating events weighted by its corresponding financial

impact. The method significantly simplifies the computation procedure by

transferring the system conditions into a multi-level model. However, with more

complex SIPS operational logic and with the integration of more renewable

generation, this method has its limitation in precisely modelling the uncertainties

and variations in system condition.

2) Stochastic Risk Assessment: A SIPS risk assessment procedure based on

sequential Monte Carlo simulation (SMCS) can better reflect the time dependent

SIPS events and the time series feature of the dynamic load profile and the

generation output. The probability of being in, and the frequency of

encountering, each SIPS operational state can be mapped to represent different

scheme behaviours. A dynamic load profile and generation prediction models

can also be integrated into the SMCS procedure to evaluate SIPS risk under a set

of different system conditions.

5.3. SIPS Communication Infrastructure Modelling

5.3.1. Introduction of Studied SIPS Communication Architectures

As described in Chapter 2, operation of SIPS is increasingly reliant on a robust

communication network and the instrumentation, monitoring, communication, control

and protection systems made available by modern IEDs and communication protocols.

Therefore, the first step in SIPS risk assessment is to determine the communication

infrastructures and the operating logic of the SIPS. A substation based sensor IED


Page | 141

becomes a node of SIPS which collects information such as breaker status, current and

voltage signals and phasors.

As shown in Figure 5-2, the studied SIPS communication architectures are represented

as a number of functional boxes interconnected. Four digital substation based

communication architectures for a Generator Rejection Scheme (GRS) application are

proposed, considering redundancy at different levels in the SAS. For a GRS line-outage

detection system, the measurements from the primary network are collected by the bay-

level IEDs and then sent to the station host computer via the substation automation

system (SAS). The information can then be used for either local decision making or sent

to the control centre through the wide area networks (WAN) for centralised decision

making. A redundant WAN communication path in a hot stand-by mode is provided to

enhance the availability of the scheme.

CB1

CB2

CB3

CB4

PB

PB

IED1

IED2

IED1

IED2

IED1

IED2

IED1

IED2

LAN

LAN

LAN A

LAN B

LAN A

LAN B

PB1

PB2

PB1

PB2

WAN

SIPS Control Centre

Substation Automation System (SAS)

Arch1

Arch2

Arch3

Arch4

WAN

Sub #1

Sub #2

Sub #3

Sub #4

Line outage information

Figure 5-2: Protection and Communication Architecture of a GRS.

CB: circuit breaker, PB: process bus communication system, IEDs: Intelligent electronic

devices, LAN: local area networks, WAN: wide area network

Due to the critical line-outage detection function, the IEDs are implemented redundantly

in all the SIPS designs. The IEC 61850-9-2 substation process bus [82], receives the

voltage and current signals digitalized by the merging units (MU), and communicates

the data to the bay level IEDs. In Arch2 and Arch4, independent process bus

communication systems are provided for each bay IED. This is achieved by duplicating

the bay level Ethernet switches and the connected devices.


Page | 142

Measurements provided by the sensor IEDs are collected by the substation computer

over the substation local area network (LAN). The advent of IEC 62439-3 Parallel

Redundancy Protocol (PRP) allows the bay IEDs to operate via two separated and

independent LANs as indicated in the last two architectures, Arch3 and Arch4. The

IEDs could simultaneously send duplicated Ethernet packets through these two LANs

(i.e. LAN A & LAN B). Consequently, if one data frame fails to reach the host

computer due to traffic, the computer can still receive the required data from the other

network without any reconfiguration time, hence providing seamless redundancy.

For each line outage detection system, two independent breaker status signals can be

received from the redundant IEDs. The scheme compares the outputs from the

redundant system prior to issuing an operation [83]. Therefore, two different tripping

logics can be programed into each design:

1) Voting (1-out-of-2): if one of the two systems detects a line-outage, the logic solver

actuates the trip decision to initiate the scheme.

2) Vetoing: the logic solver validates the decisions made by the redundant systems

prior to issuing any trip decision. If the outputs of each system are different, the

system vetoes the trip decision.

5.3.2. Communication System Modelling

The studied SIPS communication architectures are represented as a number of

functional boxes interconnected using a RBD. In particular, RBD models for the

substation process bus, substation LANs and SDH/SONET WANs are developed

considering different reliability criteria (i.e. dependability or security).

1) Substation Process Bus Sensor Network Architectures:

Different process bus sensor network architectures were illustrated in Figure 3-9. The

RBD models to assess the dependability and security of the sensor network with

duplicated process bus and a 1-out-of-2 voting logic are shown in Figure 5-3. The RBD

for dependability is built by connecting the components used to fulfil the detection

function in series, whilst putting the redundant components in parallel. Consequently,

failures in one of the duplicated devices will not affect the successful operation of the

function. In terms of the RBD developed to evaluate system security, the scheme will

trip when any of the systems falsely generates activation signals when a 1-out-of-2 logic


Page | 143

is applied. Therefore, the RBD model is constructed by connecting all the elements

capable of causing security failures in series as shown in Figure 5-3 (b).

TS 1

TS 2

IT 1

IT 2

MU 1

MU 2

BIED 1

BIED 2

SW 1

SW 2

IED 1

IED 2

EM×5

EM×5

TS×2 IT×2 MU×2 BIED×2 SW×2 IED×2

(a)

(b)

Figure 5-3: RBD to Assess the Depededability (a) and Security (b) of the Substation

Sensor Network

2) Station Bus (LAN) Architectures:

A Frame B Frame

Switch 1Switch 2

Switch 3 Switch 4 Switch 5

Switch 10

Sender

Receiver 1 Receiver 2

Primary Path

SecondaryPath

Figure 5-4: Communication Path for Multicast GOOSE in PRP based Double-Ring

A detailed description of various LAN architectures was provided in Figure 3-6 and

Figure 3-8 in Chapter 3. The RBD models for two different communication

architectures are considered: the ring architecture and the IEC 62439-3 PRP based

double ring architectures. The IEC 61850-8-1 Generic Object Oriented Substation

Events (GOOSE) message, which is a multi-cast message, can be used to transmit

protection data over the LAN of a digital substation in milliseconds. Figure 5-4

shows the communication path of sending a multicast Ethernet frame from the local

bay to two recipients allocated in other bays via PRP LANs. RBD models for a 1-to-

2 communication service (e.g. multicast GOOSE message) in a PRP ring LAN are


Page | 144

developed as illustrated in Figure 5-5. Noting that it is assumed there are 10 bays in

each substation.

EM×2

EM×2

Station SW1

Station SW2

EM

EM

Bay#1 SW×2 Station SW×2

Bay SW 1

Bay SW2

EM×2

EM×2

Station SW1

Station SW2

EM

EM

Bay SW 1

Bay SW2

(a)

(b)

Bay#2 SW×2

Figure 5-5: RBD to Assess the Depededability (a) and security (b) of the PRP Ring

LAN

3) SDH WANs:

The control centre (NCC) may require information from multiple substations at

different locations to initiate a specific remedial operation in a wide area network. The

reliability of the SONET WAN is affected by the number of substations required for the

SIPS application. Communication path in the WAN is similar with the double ring LAN.

Figure 5-6 shows the RBD for a 1-to-2 communication path in a SONET ring WAN.

Noting that it is assumed that there are 15 nodes in the SONET ring.

Sub#1 RU Sub#2 RU

Primary Ring: FI×11+RU×9

Backup Ring: FI×6+RU×3

NCC RU

NCC RUSub#1 RU Sub#2 RU

(a)

(b)

Figure 5-6: RBD for SONET WAN (a) Dependability and (b) Security

5.4. SIPS Reliability Assessment

5.4.1. Failure Mode and Effect Analysis

The first step is to determine all the possible failure modes for each component in the

communication architecture model and then determine its overall impact on SIPS

performance. In general, the failures of an individual SIPS component can lead to either


Page | 145

a dependability-based maloperation (DBM) or a security-based maloperation (SBM).

Four possible basic operational modes which can reside in a component are considered.

Based on the failure mechanism, some failure modes can be detected by the self-

monitoring function embedded in the device. A detailed description of each component

operation mode and its possible impact on the overall performance of the SIPS are

analysed as follows:

a) Normal State (State 0): In this state, components operate as designed and therefore

meet both dependability and security criteria. When all the components are in this

state, SIPS will operate as designed.

b) DBM, detectable (State 1): A component fails to deliver the designed function when

it is required. This type of DBM failure can be detected by either self-testing or

routine test. Therefore, faulted devices can be fixed and replaced before leading to a

SIPS DBM.

c) DBM, undetected (State 2): The component fails to deliver the designed function

when it is required. However, this type of component failure cannot be detected by

either self-testing or routine test. Consequently, operators are not notified of the

failure of the component. This could lead to a SIPS failure to operate when needed

if no redundancy is provided.

d) SBM (State 3): The component operates when it is not required. This can be caused

by either a spurious operation or hidden failures of the protection devices.

Eventually, it could lead to an unwanted operation of SIPS.

It is worth noting that not all the SIPS components can be in each of the four failure

modes. The possible failure modes need to be analysed based on the function and the

failure mechanism of the studied component. Not all the components are able to

contribute to the security based maloperation (SBM). For example, the Ethernet Media

(EM) such as the fibre optical cable is unable to generate any spurious trip signal by

itself. In addition, some failures can be detected in a timely manner while some remain

hidden.

SIPS have a complex architecture and comprise a number of functional modules and

each module consists of more than one basic element. Therefore, the reliability block

diagram (RBD) is used to combine individual components’ operating states and

determine the overall operational behaviours of SIPS. The combination of different


Page | 146

SIPS component states and system contingencies leads to several different overall SIPS

operational states, which can be categorized as follows:

a) SIPS Normal operation: SIPS operates correctly and promptly as designed.

The impact of a successful operation depends on the mitigation action of the

scheme. For example, when a Generator Rejection Scheme (GRS) operates as

designed, it trips a predefined generator. This will cause financial costs

associated with generator start-up and the re-dispatch of its output to other

generators during its outage.

b) SIPS DBM: SIPS fails to take action when it is required during system

contingencies. The consequence following a SIPS DBM is normally severe and

may have cascading impact on system operation.

c) SIPS SBM: SIPS operates when it is not required. Spurious operation signals

from SIPS component may lead to unwanted SIPS operation. The impact of

SBM is similar to a normal SIPS operation.

5.4.2. Markov Modelling

After determining all the possible failure modes of each SIPS component using FMEA,

the probability being in each state is estimated by Markov Modelling. A 4-state Markov

Model was developed as shown in Figure 5-7 and used to capture all the possible failure

modes coexisting in a SIPS component. It was then used to estimate the probability of

being in, and frequency of encountering, each state.

State 0

Normal Operation

State 1

DBM, detected State 2

DBM, undetected

State 3

Spurious Trip

λdd

λud

λst

µdd µst

λud

µud

Pr(DBM)Pr(DBM)

Pr(SBM)

Figure 5-7: Markov Model for SIPS Component Reliability Assessment

Considering the components’ mean time to failure (MTTF) encompasses all the failure

modes of a component, the following equations are used to estimate the failure rates

associated with each operational mode:


Page | 147

1dd ud st

MTTF (5-1)

: : : :dd ud st (5-2)

The component reliability data (or mean time to failure (MTTF) data) used in this

section are based on previous reliability assessment data, as shown in Table 3-3.

Knowing the MTTF, the failure rate of each mode is determined by the parameters ,

and . Due to the self-monitoring capabilities of numerical devices, the majority of

component failure states can be detected in a timely manner. Therefore, it is assumed

failure rate of detectable DBM is two times that of the undetectable DBM, leading to

equals to 0.5 . The probability of a SBM is assumed to be the same as that of a

detectable DBM failure, i.e. equals to . The repair rate of the detectable failures µdd

and µst are equal to 398.2 (year-1), as the faulty devices are required to be replaced

within 22 hours as required by WECC [18]. The maintenance testing which can detect

hidden failures is assumed to be carried out once every two years. This leads to a µud

equal to 0.5 (year-1). Sensitivity analysis will be carried out to assess the impact of

uncertainty in the reliability data on the simulation results.

Knowing the failure and repair rate, the transmission probability matrix B can be

obtained according to Equation (5-3). The probability of being in each state after m

intervals (P(m)) and the frequency of encountering each individual state f(S) can be

calculated as:

1

1 0 0

0 1 0

0 0 1

dd ud st dd ud st

dd dd

ud ud

st st

B

(5-3)

( ) (0)( 0) ( 1) ( 2) ( 3)m mP Pr S Pr S Pr S Pr S P B (5-4)

( ) ( ) ( ) ( ) ( )d ef S P S S P S S (5-5)

where Bm is the transition matrix of the Markov model, ( )P S and ( )P S are the

probabilities of being and not being in the state, ( )d S represents the rate of departure

from the state S and ( )e S represents the rate of entry into the state S.


Page | 148

5.4.3. Reliability Block Diagram

Due to its limited function in addressing the failures caused by the combination of the

subsystem, the Markov Model is only used to analyse the performance of the individual

component. With the probability of components being in each state determined using

Markov Model, the impact on the overall performance of SIPS is determined using the

Reliability Block Diagram (RBD). With the RBD model for each communication

architectures built, the Minimal Tie Set Method [84] is used to estimate the reliability of

various GRS operations. A minimal tie set is a path set containing the minimum number

of units needed to guarantee a connection between the input and output in the RBD. For

a system to fail, all the tie sets must fail. For a given architecture, it is assumed that T1,

T2, …, Tp are the minimal cut sets. Xi is component state (i=1, …, n), n is the number of

system components. The reliability of the structure can be written as:

1

( )j

p

i

i Tj

X X

(5-6)

For the sensor network with duplicated process bus as shown in Figure 5-3 (a), there are

four minimal tie sets:

1

2

3

4

{ 1, 1, 1, 1, 1, 1, 5}

{ 2, 1, 1, 1, 1, 1, 5}

{ 1, 2, 2, 2, 2, 2, 5}

{ 2, 2, 2, 2, 2, 2, 5}

T TS IT MU BIED SW IED EM




(5-7)

The dependability of the studied architecture can be calculated as:

1 2 3 4 1 2 3 4 1 2 1 3

1 4 2 3 2 4 3 4 1 2 3

1 2 4 1 3 3 2 3 4 1 2 3 4

5 2

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

4 2

sys

TS IT MU BIED SW IED EM TS IT MU

R P T T T T P T P T P T P T P T T P T T

P T T P T T P T T P T T P T T T

P T T T P T T T P T T T P T T T T

P P P P P P P P P P P

5

5 2 5 22 ( ) ( )

BIED SW IED EM

TS IT MU BIED SW IED EM TS IT MU BIED SW IED EM

P P P

P P P P P P P P P P P P P P

(5-8)

where P is the dependability of the component.


Page | 149

5.4.4. Reliability Assessment Results

The calculated probabilities of dependability-based maloperation and security-based

maloperation of the described sensor network architectures, LAN and WAN are shown

in the following table:

Table 5-1: Substation based Sensor Network Reliability Assessment Results

Comm. Arch.

SIPS Operational Phases

Activation Arming

Pr(DBM) Pr(SBM) Pr(DBM) Pr(SBM)

Single

Process Bus

Voting 1.65x10-2 2.33x10-5 2.25x10-2 1.83x10-5

Vetoing 2.86x10-2 1.33x10-5 2.25x10-2 1.83x10-5

Duplicated

Process Bus

Voting 6.47x10-3 3.16.x10-5 1.25x10-2 2.66x10-5

Vetoing 3.86x10-2 4.99 x10-6 1.25x10-2 2.66x10-5

The reliability assessment results for the sensor network indicate that the

implementation of duplicated process bus at substation bay level can significantly

increase the dependability of the activation and arming phases of the SIPS. However, it

may also compromise the performance in terms of security with increased Pr(SBM).

Upon receiving the tripping signals from duplicated bay IEDs, a voting tripping logic

design delivers better system dependability while a vetoing logic can effectively prevent

spurious trips. It can be seen that when a vetoing tripping logic is implemented,

duplication in the process bus system could reduce the dependability in detecting line

outage. Meanwhile, regular testing and timely replace of faulty devices is vital in

keeping the device in its normal operating state. The reliability of the single ring and

PRP ring LAN architectures and the SDH ring WAN architecture are estimated using

the RBD. It can be seen that the single ring LAN architecture may not be sufficient for

the SIPS application due to its low dependability. The reliability data are then used in

the numerical studies to illustrate the impact of the communication architectures on

SIPS performance.

Table 5-2: LAN and WAN Reliability Assessment Results

Comm. System Pr(DBM) Pr(SBM)

LAN Single Ring LAN 9.68x10-3 7.79x10-6

PRP Ring LAN 9.38x10-5 1.56x10-5

WAN SDH Ring 1.52x10-4 1.98x10-5


Page | 150

5.5. Risk Assessment Numerical Illustration: Analytical Method

In this section, the analytical SIPS risk assessment method is applied to assess an event-

based Generator Rejection Scheme (GRS) implemented in a 3-bus system as shown in

Figure 5-8. The GRS comprises two basic operational phases: the activation phase,

which continuously monitors the status of the two critical lines (i.e. Line 1 and Line 2),

and the arming phase, which monitors the load level at Load 3 and arms the GRS only

when it is higher than a certain level. DC power flow was used in this analysis. Table 5-

3 shows the system generation and load level of the test system. The thermal limit for

all the circuits is set to be 200 MW.

Figure 5-8: 3-Bus System with Generator Rejction Scheme (GRS)

Table 5-3: Generation Data of the 3-bus System

Bus Generator Capacity(MW) Peak Load

(MW) Min Max

B1 G1 175 500 170

B2 G2,G3 150 400 100

B3 - - - 500

5.5.1. GRS Operating Logic

The purpose of the GRS is to maximize the power transfer from the low-cost generators

at bus B2 to bus B3. Based on the DC power flow results, when the load level at Load 3


Page | 151

exceeds 400 MW, outage of either Line 1 or Line 2 will result in the overload and

cascade tripping of the other line. Hence, the GRS needs to be armed to trip G2 if a

critical line outage is detected and the output of G2 must be redispatched to G1.

Without GRS, or in the case of GRS DBM, and during high load demand conditions, an

outage of a critical line will lead to cascade tripping of the transmission lines

interconnecting to Load 3, which may eventually result in isolating the load from the

rest of the system. In addition, there is also a risk that the generators connecting to B2

would also be tripped due to the out-of-step condition, if this follows a significant

decrease in the load demand. The implementation of GRS could effectively mitigate the

overloading on the stressed circuit by redispatching the output of the tripped generator

at B2 to the generator at B1. When there is a fault on one of the critical lines (line 1 or

line 2), the fault should be cleared by opening the circuit breaker at both ends of the

circuit. The scheme will then be activated after receiving the trip signal from the

protection device. At the same time, if the scheme is armed, generator G2 will be

tripped to prevent the overload of the healthy transmission line between B2 and B3.

5.5.2. Analytical Risk Assessment Procedures

5.5.2.1. Identification of Initiating Events

The event-based GRS can be activated by two basic events, F1 and F2, which represent

the outage of Line1 and Line2 respectively. Therefore, based on the IEEE RTS

reliability data [85], the line outage rate (λline) is equal to 4.57×10-5 (hr-1) (MTTF = 2.5

years). The probability of the basic event in the next hour is obtained by approximating

the line outage event as an exponential distribution:

5Pr( ) 1 5.71 10t

iF e (5-9)

Pr( ) 1 Pr( ) 0.99994i iF F (5-10)

where Pr( )iF and Pr( )iF represent the probability of occurrence and non-occurrence of

the initiating event Fi respectively.

Additionally, the GRS is only armed into service when the load level at Load 3 is higher

than a pre-specified value (Load 3>400MW), the probability of which can be estimated

by analysing the year-round load profile provided by the IEEE-RTS load model. With

the peak load at B3 being 500MW, the probability of load 3 being in a high load level


Page | 152

which requires the GRS to be armed is expected to be 12.45%. Therefore, the

probability of the scheme being armed Pr(L) is:

12.45%Pr(L) (5-11)

Next, five initiating events are combined considering all the possible combinations of

the basic events:

E1: No line outage. GRS not armed.

1 1 2( ) ( ) ( ) ( )Pr E Pr F Pr F Pr L (5-12)

E2: No line outage. GRS armed.

2 1 2Pr(E )= Pr( F )× Pr( F )× Pr(L) (5-13)

E3: One line outage. GRS not armed.

3 1 2 1 2Pr(E )= Pr(F )× Pr( F )+ Pr( F )× Pr(F ) × Pr( L ) (5-14)

E4: One line outage. GRS armed.

4 1 2 1 2Pr(E )= Pr(F )× Pr( F )+ Pr( F )× Pr(F ) × Pr(L) (5-15)

E5: Both lines outage.

5 1 2Pr(E )= Pr( F )× Pr( F ) (5-16)

Since the adjacent circuits are considered to be independent, the probability of

simultaneously losing both lines is negligible. Consequently, event E5 is not considered

in the case study. Among these initiating events, only E4 requires a GRS operation.

5.5.2.2. Formulate GRS Risk Expression

The probability of the system initiating events, combining with different GRS operation

states, can be denoted as Pr( ( , ))iE T T , where ( , )T T denotes whether the GRS operates

or not. The following situations are then considered (“Act1” and “Act2” represents the

line-outage detection systems of Line1 and Line2 respectively. “Arm” represents the

load monitoring system used to arm the GRS). Im(Normal), Im(DBM) and Im(SBM)

represent respectively the impact of SIPS normal operation, DBM and SBM on system

operation under a particular system condition:


Page | 153

1) Situation 1 ( 1E T ): Unwanted GRS operation during initiating event E1. GRS

operation when it is not armed and there is no fault on line 1 or line 2. This requires

security-based misoperation in both activation and arming phases.

1 1Pr( ) Pr( ) [Pr( _ 1) Pr( _ 2)] Pr( _ )E T E SBM Act SBM Act SBM Arm (5-17)

1Im( ) Im( )E T SBM (5-18)

2) Situation 2 ( 2E T ): Unwanted GRS operation when it is armed but without circuit

outage in the system. This situation is caused by SBM in either of the two line-

outage detection systems.

2 2Pr( ) Pr( ) [Pr( _ 1) Pr( _ 2)]E T E SBM Act SBM Act (5-19)

2Im( ) Im( )E T SBM (5-20)

3) Situation 3 ( 3E T ): Unwanted GRS operation when a fault occurs on one circuit,

but the load level is lower than the triggering point. GRS trips due to SBM in the

arming phase.

3 3Pr( ) Pr( ) Pr( _ )E T E SBM Arm (5-21)

3Im( ) Im( )E T SBM (5-22)

4) Situation 4 ( 4E T ): fault occurs on one critical circuit. Meanwhile the load level

at Load3 exceeds 400MW. GRS fails to operate due to dependability-based

misoperations (DBM) in either arming or activation phase.

4 4Pr( ) Pr( ) [Pr( _ ) Pr( _ )]E T E DBM Act DBM Arm (5-23)

4Im( ) Im( )E T DBM (5-24)

5) Situation 5 ( 4E T ): fault occurs on one critical circuit, meanwhile Load3 exceeds

400MW. GRS operates as designed.

4 4Pr( ) Pr( ) [1 Pr( _ ) Pr( _ )]E T E DBM Act DBM Arm (5-25)

4Im( ) Im( )E T Normal (5-26)

The impact of GRS normal operation, DBM and SBM is estimated as described in

Table 5-4 by considering the corresponding consequence and financial impact [74, 86].

In the case of successful GRS operation, generator2 (G2) connected to Bus B2 will be

tripped by the scheme for 2 hours. As illustrated in equation (5-27), the financial cost is


Page | 154

associated with the start-up of G2 and the redispatch of G2’s output to other generators

during the next two hours. In the case of DBM, during high loading conditions, outage

of any critical line (i.e. line 1 or 2) will result in overloading and cascade tripping of the

transmission lines interconnecting Load3, isolating it from the generation plants. Outage

of the critical lines will also cause an out of step condition for all the generators

connected to Bus B2. It is assumed that all the generators at the plant (G2 and G3) will

accelerate and then be tripped due to over-speed. Load2 will then be supplied by G1.

The impact caused by SBM will be exactly the same as the impact of normal operation.

Table 5-4: Impact Assessment for GRS Misoperation [74, 86]

Cost Items Quantity(MW) Duration(hrs) $/MWh(Case)

Success Operation

Unit start-up G2 - 5000 $/Case

Re-dispatch PG2=194.5 2 50 $/MWh

Dependability-based Misoperation

Load shedding PLOAD_3=424 2 18,000 $/MWh

Unit start-up G2, G3 - 5000 $/Case

Re-dispatch PLOAD_2=100 2 50 $/MWh

Security-based Misoperation

Unit Start-up G2 - 5000 $/case

Re-dispatch PG2=194.5 2 50 $/MWh

Based on the assessment, the impact of each GRS state can be calculated as:

2Im( ) 50 2 5000 $24,455GNormal P (5-27)

3 2Im( ) 18000 2 50 2 10000 $15,284,000LOAD LOADDBM P P (5-28)

2Im( ) 50 2 5000 $24,455GSBM P (5-29)

The risks induced by GRS can be calculated as:

5( ) Pr( ) Im( )Risk Normal E T Normal (5-30)

4( ) Pr( ) Im( )Risk DBM E T DBM (5-31)

1 2 3( ) Pr( ) Pr( ) Pr( ) Im( )Risk SBM E T E T E T SBM (5-32)

The probabilities used in the GRS risk expression can also be estimated using the fault

tree analysis (FTA). For example, the probability of insecure GRS DBM when the fault

occurs (Situation 4) 4Pr( )E T can be estimated using FTA shown in Figure 5-9. The

highest event happens when the scheme is in the DBM state, whilst GRS operation is

required. GRS DBM is caused by an absence of line-outage identification (activation


Page | 155

phase) or the arming signal (arming phase). While GRS is required when there is a fault

on Line1 or Line2 (1 2Pr( ) Pr( )F F ) and the load level at Load3 exceeds 400MW.

Figure 5-9: Fault Tree Analysis (FTA) to Assess the Probability of GRS DBM

5.5.3. Analytical Risk Assessment Results

The analytical assessment method is carried out to evaluate the performance of the local

SIPS. Two different sensor network communication architectures, as illustrated in

Figure 3-9 in Chapter 3, are considered, while the reliability of the wide area

communication networks are not addressed. The scheme risk induced by

communication architectures with different tripping logics is shown in Figure 5-10.

Figure 5-10: GRS Risk Assessment Results for Different Sensor Network Architectures

It can be seen that the risks caused by GRS normal operation “Operating Cost” stay

approximately the same for all of the four GRS designs. DBM risks are the main

operating risk for all the studied designs. This is because in the three-bus system, DBM

of GRS will lead to a complete loss of load at Load 3, which has a considerably larger


Page | 156

impact than a SBM. Moreover, the use of the vetoing logic delivers a better

performance in SBM. However, it also leads to a noticeable increase in the risk of DBM,

especially for the scheme with a high level of redundancy (e.g. Arch2 vetoing).

Therefore, implementation of process-level communication system redundancy requires

careful consideration and may not simply lead to a more reliable situation.

The previous assessment considers the GRS with the line-outage information acquired

from the local substation. With telecommunication facilities between substations, line-

outage detection devices from the remote terminals of a transmission line can be

acquired by the scheme as an intertripping (I/T) signal, which could significantly

enhance the dependability performance of the scheme. Figure 5-11 compares the risks

of the previous assessed GRS designs (Arch2 voting & vetoing) with the risks of GRS

using the intertripping signal from the remote end of the transmission line. It can be

seen that monitoring the line status at both ends of the line delivers less risk in DBM.

However, as a tradeoff, it leads to an increased risk of SBM for both designs.

Meanwhile, increased time delay may be caused in the GRS decision making, when

considering the transmission of signals from the remote terminal to the local scheme

programmable logic controllers (PLC).

Figure 5-11: GRS Risk Comparison with and without Intertripping (I/T) Signal

5.6. Sensitivity Study

Due to the high uncertainty of the data used in the reliability assessment, sensitivity

analysis is carried out to determine the impact of the variation in the assumed data on

the risk evaluation results. Moreover, sensitivity analysis can also determine the

weakest operational phase in GRS, which needs to be improved to fulfil the reliability


Page | 157

requirements. The factors affecting the GRS risks and the selection of the optimal

design are listed as follows:

a) The reliability of electronic components: mean time to failure (MTTF) and mean

time to repair (MTTR) data.

b) The frequency of scheme initiating event (e.g. critical line outage rate λline for

GRS).

c) The probability of the system being in its overloading state (Pr(L)).

The first factor focused on the reliability of SIPS infrastructure and its maintenance

strategy, while the last two factors consider the impact of Power System conditions on

the assessment results. In particular, the wide range method is used to change the value

of these factors over a wide range to examine their impact on the risk assessment results.

The sensitivity analysis results are then used to determine the optimal scheme design

under different system conditions.

5.6.1. Impact of Component Reliability on GRS Risk

The reliability of the electronic components used in the SIPS affects the successful

arming process and the detection of the scheme initiating events. The Mean Time to

Failure (MTTF) is affected by many factors such as vender, age, weather, etc. The

MTTF and MTTR of the SIPS component are varied from 0.1 to 10 times of its original

value in Figure 5-12. Changes in MTTF and MTTR affect the failure rate (λ) and repair

rate (µ) of a component as illustrated in Equation (5-1). Hence, the probabilities of a

component being in each operational state in the Markov Model also vary. It can be

observed that with an increased MTTF, the overall scheme risk reduces. Whilst the

increase in MTTR (decrease of µ), which means a less frequent inspection and

maintenance, leads to higher scheme risks.

Variations in components reliability data won’t affect the selection of the sensor

network since risk introduced by the GRS using vetoing logic and the intertripping

signal stays the lowest. However, to enhance the performance of the GRS, maintaining

the reliability of the devices high and carrying out more frequent inspection are proved

to be effective methods. Figure 5-13 illustrates the improvement in GRS (Arch2, voting)

performance by enhancing the reliability or maintenance of different GRS operational


Page | 158

phases. The entry point for the GRS risks to be less than a certain level (1$/hr) for

different strategies are shown in Table 5-5.

Figure 5-12: Impact of MTTF and MTTR on Risks of Different GRS Designs

Increasing the MTTF of either the activation phase or the arming phase or both of them

could deliver enhanced overall GRS performance. For example, the total scheme risk

drops from 1.2 $/hr to 1 $/hr at approximately 1.4×MTTFbase for enhancing both

operational phases, 1.6×MTTFbase for the activation phase and 3.9×MTTFbase for the

arming phase. The increase in system MTTR means a lower frequency in scheme

testing and maintenance. Therefore, it leads to higher scheme risks. Maintenance on the

activation phase has a more significant impact on enhancing system reliability as

compared with the arming phase.


Page | 159

Table 5-5: Entry Point for GRS Risk to Reach below 1$/hr

SIPS Phase Times (×MTTF_base) Times (×MTTF_base)

Activation 1.6 0.65

Arming 3.9 0.25

Both Phases 1.4 0.77

Figure 5-13: Impact of Reliability of each GRS Phase on Overall Risks for Local GRS

Left: Enhancing MTTF on GRS (Arch2, Voting) Risk

Right: Enhancing MTTR on GRS (Arch2, Voting) Risk

5.6.2. Impact of System Conditions on GRS Risk

The risk of the studied event-based GRS also varies with system conditions such as

line-outage rate (λline) and system load levels. Variation in line-outage rate λline affects

the frequency of the GRS being triggered. The reliability of the transmission line can be

affected by many factors, such as weather condition, power flow, terrain, etc. Figure 5-

14 demonstrates the impact of line outage rate on different GRS designs. A higher line

outage rate leads to an increased overall risk for all the GRS designs. Variation in line-

outage rate has very little impact on risks caused by security-based misoperation (i.e.

Risk(SBM)). However, the risk of DBM and risk of normal operation are increasing

linearly with the line-outage rate. Consequently, when the critical lines are highly

reliable, risk of SBM becomes the main contribution to the GRS overall risks. On the

contrary, for unreliable lines, Risk(DBM) and Risk(Normal) will significantly affect the

GRS overall risk. More specifically, when the line outage rate is less than 0.7 times the

base value (i.e. λline-new=0.7×λline), schemes using only the local line-outage information

have a better performance than the schemes using both local and remote line-outage

signals. However, when the scheme is required to operate frequently, it is more effective

to have a SIPS design with better dependability.


Page | 160

Figure 5-14: Impact of Critical Line Outage Rate on GRS Risks

Variations in the Load 3 annual peak load might change the GRS risk in two different

ways: Firstly, the frequency of GRS operation is changed due to the change in the

probability of the scheme being in the arming state. Secondly, the financial impact of

losing Load3 is also changed due to the change in its quantity. Figure 5-15 shows the

variation in GRS risks with the annual peak load at Load3. Load1 and Load2 are kept

fixed in this wide range analysis. When Load3 annual peak value is relatively low,

Risk(SBM) becomes the main composition to the scheme risks. Consequently, scheme

designs with better security deliver better overall performance. However, for higher

Load3 level, GRS is required to operate more frequently, resulting in higher DBM risks.

Under the circumstances, introducing redundancy in the communication system or in

the activation signal could minimize the risk from DBM.

Figure 5-15: Impact of Load Level on GRS Risks


Page | 161

5.7. Summary

A SIPS risk assessment procedure based on FMEA, Markov Model and Reliability

Block Diagram is described in this chapter and applied to a portion of the IEEE

Reliability Test System with GRS logic. The probability of different SIPS operational

states and their impact on system integrity are evaluated. The procedures are used to

quantify the risks of GRS with different sensor network architectures and tripping logics.

The optimal design can be determined by comparing the annual cost of GRS.

Meanwhile, a relatively low variation in scheme operational risk is also vital in a system

with continuously changing system conditions.

By comparing the performance of GRS implemented based on different sensor network

architectures, it can be concluded that the implementation of duplicated process bus

communication systems may not necessarily lead to a better performance, since it also

cause a noticeable decrease in scheme security. Risk brought by scheme SBM can be

effectively controlled by the use of vetoing tripping applied to the redundant line-outage

detection IEDs. In addition, enhanced performance can be provided by a centralized

GRS as compared with a local GRS. The proposed methodology can help utilities

understand the impact of ICT on the SIPS performance and how the scheme architecture

can be designed to balance the trade-off in SIPS dependability and security.

Applying the sensitivity study to the factors governing the scheme performance

provides useful guidance for the utilities in their allocation of inspection and

maintenance. The wide range method can be used to assess the effectiveness of different

reliability enhancement strategies to minimize the risk following SIPS undesirable

operations. It helps identify the most critical factor in the scheme dependable and secure

operation. Moreover, by assessing the variation in the operational risk under different

system conditions, the scheme operator could select the optimal GRS logic design based

on the current system condition.

Page | 162

CHAPTER 6

RISK OF IMPLEMENTING SIPS IN A

SYSTEM WITH LARGE-SCALE WIND

INTEGRATION

6.1. Future UK Power System

The requirement to decrease the carbon intensity of the electricity system requires an

increasing penetration of renewable energy and the removal of generation based on

fossil fuels. The renewable generation being connected to the system, notably the wind

energy, are bringing significant challenges to the transfer capability of the transmission

network. In addition, the integration of the wind farms also generally means the sources

are remote from the load and the existing transmission lines are normally expected to

operate closer to their operating limits particularly when the wind intensity is high. SIPS

applications, motivated by wind energy, become increasingly attractive in long-term

system planning to mitigate the impact of cascading events triggered by extreme

contingencies, because of the relatively low cost of SIPS as compared with transmission

expansion. To effectively assess the risk of SIPS operation, there is a need to address

the changes expected in the future Power System in the risk assessment model [87]. The

Great Britain Power System involves significant deployment of wind generation in

Chapter 6: Risk of Implementing SIPS in a System with Large-Scale Wind Integration

Page | 163

Scotland and off-shore, whilst the demand is mainly in Southern England. A generator

rejection scheme (GRS) is a commonly used SIPS implemented to enhance the

utilization of a transmission corridor. Hence, the UK National Grid’s documents, such

as electricity ten-year statement (ETYS) and system operability framework (SOF), were

reviewed to update the changes expected in the system operating conditions into the test

system [81, 88].

6.1.1. Future Energy Scenarios and Wind Generation

In the future, the Power System in Great Britain is going to involve a significant

deployment of wind generation in the North and off-shore. This section provides a more

detailed description of the generation backgrounds and outlines the key changes in the

composition of generation expected in the next 20 years. The future energy scenarios

were firstly defined by UK government former Department of Energy & Climate

Change (DECC) and are updated annually in consultation with stakeholders. It outlines

a few alternative directions for system development. These scenarios provide a useful

source about information of future system performance and have been termed:

Consumer Power, Gone Green, Slow Progression and No Progression. They are

significantly different from each other but all meet the requirement for security of

supply with sufficient generation capacity.

A significant increase in the percentage of capacity associated with renewables can be

expected in the next 20 years for all four scenarios. For the highest scenarios (i.e. Gone

Green and Consumer Power), the penetration of renewables could reach 41% in 20

years. Even for the scenario with the slowest progress (i.e. No progression), 26% of

UK’s electricity will come from renewable sources by 2036 to meet the target. At the

same time, the proportion of conventional generations from coal and gas continues to

decrease for the coming years. The trend of generation mix for the Gone Green scenario

is illustrated in Figure 6-1 and is normally selected for the future system performance

study due to its particularly high penetration of renewable generation and resulting low

system inertia. In particular, the proportion of wind energy could occupy 17.5% of the

total electrical generation, which is approximately 10.5 GW. Therefore, the high

penetration of wind energy stimulates the application of generator rejection scheme

which trips the non-priority generation (e.g. wind generation) during overloading.

Although the variability of renewable sources can be mitigated by integrating


Page | 164

generation over large areas [89, 90], there is still a potential to encounter extreme

system conditions. For example, according to [89], 50% of UK wind turbines might

experience zero-output events coincidentally, with an average of 100 hours per year.

During a winter with low-wind speed and low-temperatures, the expected high amount

of load demand could require significant reserve generation capacities. Consequently,

the intermittent nature of the renewable generations has to be reflected in the risk

assessment model.

Figure 6-1: Gone Green Transmission Generation Mix [81]

6.1.2. Load Profiles

Due to the growth in weather-dependent distributed generation (DG), the transmission

demand is becoming more variable. Large generators and other interconnectors have to

be more flexible in order to accommodate the variation in transmission demand. In

addition, the use of smart meters raises the prospect domestic customer demand can be

responsive to changes in supply capacities and financial cost of bought energy. It also

facilitates the prediction of the transmission demand which provides more precise

information on how much capability is needed and how often it is required. The

proposed risk assessment could then focus on extreme demand conditions.

In the year of 2016, the UK transmission demand varied from 17 GW (i.e. summer

minimum) to 52 GW (i.e. winter peak). Typical daily demand profiles for the four


Page | 165

different future energy scenarios are illustrated in Figure 6-2. The variation in the

transmission demand between different scenarios becomes more noticeable over the

next decades. The growth in distributed solar generation causes transmission demand to

be suppressed during the middle of the day when the sunshine reaches its peak value.

This phenomenon is more significant during summer days with strong sunshine and

among the future energy scenarios with greatest distributed solar generation growth (i.e.

Gone Green, Consumer Power). In the future, the continuous growth in renewable

distributed generation is expected to result in a greater change in the magnitude of

transmission demand over a smaller timespan. This may significantly affect the risk

associated with the protection schemes. Consequently, a more flexible protection

strategy is required to adapt to the changing system conditions.

Figure 6-2: Variation in Daily Load Profile for Different Energy Scenario [81]

6.1.3. Transmission Line Reinforcements

With the continuously changing GB energy landscape, as described in the previous

sections, the National Electricity Transmission System (NETS) will face significant

challenges in fulfilling the transfer capability. The potential deficits in network capacity

can be caused by the following reasons [81]:


Page | 166

1) Large amount of wind generation connected to the Scottish networks significantly

increases the transmission capacity requirement from Scotland into England.

2) An increasing amount of low carbon generation and interconnectors in the

northern England region increases the export requirement into the English

Midlands.

3) The West Midlands region will have to import more power from distant regions

due to the reduction in conventional generation capacity in the Midlands.

4) Growth in the power transfer from offshore wind generation on east coast to the

southern regions will stress the southern NETS. Meanwhile, the interconnectors

are placing increased stress on the Southern English network when exporting

power out of GB.

To develop the British network in an efficient, coordinated and economic way, the

future requirements and the present capability and deficits of the NETS is assessed in

the Network Option Assessment (NOA) report [91]. For each defined boundary,

transmission expansion options for each energy scenarios are determined based on both

current transmission capability and future generation integration.

According to the NOA, the “commercial and non-build” option is another way to

reinforce the transmission capacities. This includes initiative strategies such as: 1)

Network support from demand side response and distributed resources (e.g. embedded

generation, load and storage), 2) Using active network management on generation and

demand such as inter-trips, 3) providing reactive power services at certain locations.

However, the corresponding reliability requirement and risk assessment procedures are

required to evaluate the possible additional risks caused by these operational strategies.

6.2. Stochastic Risk Assessment Procedures

The continuously changing GB energy landscape will bring significant challenges to

system integrity in the future years. The rapid increase in the proportion of renewables

will require utilities to more frequently trip non-priority generation, such as gas, coal

and wind generation, to ease transmission congestions. Therefore, future Power Systems

need to be equipped with more SIPS, and these would mainly be used for generation

rejection, to ensure the reliability requirements are satisfied and increase the capability

to integrate more renewables.


Page | 167

As reviewed in Chapter 5, most of the SIPS reliability assessment methods are

analytical. This becomes insufficient to reflect the impact of fast changing system

conditions on the SIPS operational risks. In addition, the use of wide-area SIPS or

centralised SIPS, made available by fast developing ICT, allows the SIPS decision to be

made based on measurements collected over a wide area network. This significantly

increases the complexity in SIPS operation and also requires the reliability assessment

method to address the impact of SIPS on a wide area system.

In this chapter, a method based on Sequential Monte Carlo Simulation (SMCS) is

proposed and firstly applied to assess SIPS operational risk. This method allows

dynamic SIPS risk assessment taking account the time varying feature of the generation

and load demand. The SMCS method is best suited for the time dependent SIPS events

and is used to simulate the time series feature of the load profile and the wind output. A

dynamic annual hourly load profile based on the IEEE-RTS load model and the wind

farm output profiles predicted using the auto-regressive and moving averages (ARMA)

model are mapped into the SMCS test procedure to reflect the random behaviour of the

system generation and demand. The probability and the frequency of each SIPS

operational state obtained from the reliability assessment are mapped into the model to

represent different scheme operational behaviours, e.g. normal operation, DBM and

SBM.

As shown in Figure 6-3, the scheme is triggered by an initiating event, which is

normally a critical circuit outage or overloading, and the GRS is designed to prevent

cascading tripping. The probability of a line outage event (Ei) is usually approximated

as an exponential distribution. Once the line outage rate (λline) is known, the probability

of the occurrence of the initiating event in the next hour can be obtained as:

Pr( ) 1 t

iE e (6-1)


Page | 168

Start

End

SIPS Reliability DataPr(SO), Pr(DBM), Pr(SBM)

Wind Generation ProfileLoad Profile

Triggering Event?

SIPS Operation?

DC Optimal Power Flow(OPF)

Converge?Impact Assessment

(i.e. VOLL, LOWG, Energy Redispatch etc. )

Network & SIPS Status Mapping (at time t)

Load Shedding

SIPS Operation?

Stop

Success Operation

(t=t+Δt)

Scheme

Yes

Yes

No

No

Yes

YesNo

Plant

Yes

No No

SIPS SBMSIPS DBM

Risk AssessmentRisk(SO)=Pr(SO)×Im(SO)

Risk(DBM)=Pr(DBM)×Im(DBM)

Risk(SBM)=Pr(SBM)×Im(SBM)Pr(DBM) Pr(SBM)Pr(SO)

Pr(E)

SMCS Stop Criteria

Figure 6-3: Risk Assessment Procedure using SMCS

A set of different system conditions with different weather and load levels is produced

by the year-round risk assessment. The impact of each SIPS operational state under a

certain system condition is:

1 1

1 1( , ) ( ( ) ( ) ( , ))

N N i i i

i i L i G i R i ii iImpact Im g d C d C g C g d

N N (6-2)

where N is the number of the samples within a year-round simulation and ( , )i iIm g d

computes the impact of the GRS maloperation as a function of the generation output

and the load level. The parameters ( )i

L iC d , ( )i

G iC g and ( , )i

R i iC g d represent the cost

associated with load shedding, wind curtailment and generating capacity redispatch.

In each of the operating hours during a year period, the Value of Lost Load (VOLL), the

Loss of Wind Generation (LOWG) and the capacity of dispatched generation from other

parts of the system (LOWG-VOLL), incurred after each particular SIPS operation, are

calculated. The optimal power flow is performed based on the system conditions at the


Page | 169

time i. These are then used to calculate the three parameters using the cost figures given

in Table 5-4:

( ) ( ) 18000i

L iC d VOLL i t (6-3)

( ) ( ) 120 50i

G iC g LOWG i t N (6-4)

( , ) ( ( ) ( )) 50i

R i iC g d LOWG i VOLL i t (6-5)

Risks introduced by a GRS operation can be calculated by multiplying the probability of

a particular operational state with its corresponding impact:

1( ) ( ( ) ( ) ( , ) ( ) ( ))

N

i i i i i i iiRisk S Pr E Pr S Im g d Pr g Pr d

(6-6)

where N is the number of the samples within a year-round simulation, and the

parameters ( )iPr E , ( )iPr S , ( )iPr g and ( )iPr d represent the probabilities of the

initiating event Ei, the state Si, the generating output gi and the load level di.

The SMCS is a fluctuating convergence process. The estimated indices will approach

their “real” value as the simulation proceeds. The expectation of the risk (E(Risk)) in N

sampling hours can be estimated using the following equation:

1( )

N

iiRisk

E RiskN

(6-7)

The variance of the estimated risk can be obtained by:

2 2

1

1[ ( )]

( 1)

N

i

i

Risk E RiskN N

(6-8)

where Riski denotes the sample value of the risk in the hour i.

The simulation should be terminated when the estimated reliability indices reach a

specified degree of confidence to achieve a compromise between accuracy and

computation effort. The coefficient of variation (α) is often used as the convergence

criterion in SMCS and is defined as:

/ ( )E Risk (6-9)

The number of samples required by the SMCS can be determined by the two stopping

rules [59, 92]: The first approach is to use a sequential stopping procedure and to let the


Page | 170

SMCS run until the coefficient of variation (α) reaches the predefined tolerance value.

The second approach is to run a given number of samples and then check if the

coefficient of variation is acceptable. If not, the number of samples can then be

increased. In this simulation process, the SMCS stops when the coefficient of variation

(α) is less than 1%.

6.3. System Condition Time-series Model

6.3.1. Wind Forecast Model

Due to the intermittent power output of a wind farm, a wind speed forecast model is

required to accurately predict the wind output from the wind farms. The auto-regressive

and moving averages (ARMA) model, which is an accurate wind speed forecasting

technique, is used to represent the time-series feature and the probabilistic

characteristics of wind speed. The historical hourly wind speed data of an off-shore

wind farm in northwest England over the period 1980 to 2010 [93] was used as the data

base to predict the future wind speed. The speed data is then converted into the wind

generation profile using a wind turbine generator (WTG) model.

Historical Wind Speed Data at Wind

Farms

Auto Regression and Moving

Average Model

(ARMA)

Predicted Wind Speed

Data

Wind Turbine Model

Wind Power Output

Figure 6-4: Procedures to Produce Times-Series Wind Farm Output Data

The ARMA model [94] first standardizes the historical wind speed (WS) samples at

each location as:

( ) /t t t ty WS (6-10)

where t and t are the historical mean wind speed and standard deviation respectively.

The time sequential data series set yt is then used to establish the wind speed time series

model:

1 1 2 2 1 1 2 2t t t n t n t t t m t my y y y (6-11)


Page | 171

where i and j are the autoregressive and moving average parameters of the ARMA

model. t is the normal white noise process with zero mean and a variance of 2

a (i.e.

2(0, )t aNID ), where NID denotes normally independently distribution. The

simulated wind speed at a particular location and time are expected as:

t t t tV y (6-12)

where tV is the simulated wind speed at t based on the historical mean wind speed t

and the standard deviation t .

Finally, the wind output characteristics of a 3MW Vestas V90 wind turbine [95] are

used to translate the predicted wind speed data into the generation output. The nonlinear

relationship between wind speed and the output of the wind turbine is described in

Equation (6-13):

2

0 0

0

t ci

t t r ci t r

t

r r t co

co t

V V

A B V C V P V V VP

P V V V

V V

(6-13)

where, Vci, Vr and Vco represent the cut-in wind velocity, rated wind speed and cut-off

wind velocity respectively; Pr represents the rated power of the wind turbine, and A, B

and C are the parameter of the wind turbine output characteristic curve. For this study,

the cut-in, rated and cut-out speeds of the selected wind turbine are 3.5, 15 and 25 m/s

respectively. Hourly wind speeds were repeatedly simulated for 100 yearly samples and

are then mapped into the risk assessment model. Figure 6-5 shows the probability

distributions of raw wind speed data recorded in 31 years and the characteristic of the

WTG model. The mean wind speed and the


Page | 172

Figure 6-5: Wind Speed Data Distribution and Wind Turbine Model

6.3.2. Power System Load Profile

Electric load profile forecasting is the most important tasks in Power System operation.

In future, due to the growth in weather-dependent distributed generation, the British

transmission demand becomes more variable. The SIPS operation therefore has to be

more flexible and smart in order to accommodate the variation in transmission demand.

Consequently, the test model must integrate the dynamic transmission demand profile to

provide a more precise simulation on how much generation capability is needed and

how often it is required. The IEEE-RTS load model [85], a widely-used system for

Power System Reliability Evaluation, was used to forecast the load variation in a year.

A profile of hourly peak load during a calendar year is created in the load profile with

the detailed data shown in Appendix C.

Figure 6-6: IEEE RTS Yearly Load Profile


Page | 173

6.4. Numerical Illustration of Stochastic SIPS Risk Assessment

In this section, the risk assessment process based on the stochastic method is applied to

an event-based GRS implemented in a multi-unit plant of a more complex IEEE-24 bus

Reliability Test System [85].

Three modifications were made to the original test system to make it more stressed. The

initial system conditions used for GRS risk assessment are:

The load level at each load point is increased by 40%.

3×300MW wind farms are connected to B13.

The load at B13 is removed.

In the modified IEEE 24 bus system, L18 and L20 are two critical circuits connecting

the integrated wind farms to the customer side 138kV system in the low half of Figure

6-7. Due to the integration of the wind farms, these two circuits would be heavily

loaded when the power output at the wind farm and the load level are high. An outage

of either of the critical circuit (i.e. L18, L20) could result in a cascade tripping of the

other circuits interconnecting the power plant and the rest of the system. This can lead

to the outage of the entire wind farm and load shedding at some load points especially

when the load level is high. This situation can be alleviated by disconnecting one of the

three wind farms when a critical line outage is detected.

The GRS, depending on its communication network, could collect the monitoring data

form either local substations or from substations all over the network and use it for

decision making. A system-wide SIPS, which is considered to be the future trend [28],

is implemented in the test system. Unlike the local schemes, system-wide SIPS could

collect measurements from all the substations in the network and use these in a

centralised decision making process. For the studied GRS, the controller uses the

breaker status data of L18 and L20 as an activation signal. In addition, the generator

output is used as arming signal to initiate the scheme. The status of the circuit breakers

at both of the critical lines can be acquired from the local substation B13 or from the

substations B11 and B12 at the remote end of the critical transmission lines. Meanwhile,

the power output of the power plant at B13 is continuously monitored by the power

meters. The GRS is armed only when the power output is higher than a certain level.


Page | 174

Figure 6-7: IEEE 24-Bus Reliability Test System with GRS Logic


Page | 175

Figure 6-8 shows the method used to determine the arming point of the GRS by

comparing the system risks without GRS and the risks when a reliable GRS is

implemented. When the total generation output of the wind farm is below 560 MW, the

system meets the ‘N-1’ criterion. Consequently, outage of one critical circuit won’t

bring any risk to the system. However, when the generation level is beyond 560 MW,

outage of a critical circuit will cause cascading tripping and lead to increased risk.

When GRS is implemented, the risks can be controlled can maintained at a relatively

low level. However, when the generation level is lower than 570 MW, implementation

of GRS causes more frequent wind rejection. Consequently, the most economical

strategy is to arm the GRS when the generation level is above 570 MW.

Figure 6-8: Comparion between System Risks with and without GRS

6.5. Stochastic Risk Assessment Results

The historical hourly wind speed data of an off-shore wind farm in northwest England

over the period 1980 to 2010 [93] was used as the data base to predict the future wind

speed. Hourly wind speeds are repeatedly simulated by the ARMA model to obtain a

large number of yearly wind speed samples for the wind farm. Figure 6-9 shows the

probability density distributions of the historical wind data and the simulated wind data

via ARMA model. It can be seen that the probability distribution of the wind speed is

close to normal distribution. The mean wind speed value (μ) and the standard deviation

(σ) of the simulated data and recorded data are extremely close.


Page | 176

Figure 6-9: Simulated and Histroical Wind Speed Data Probability Density Function

The simulated wind speed data are then mapped into the SMCS model illustrated in

Figure 6-3. The coefficient of variation (α) is used as the stopping criterion of the

SMCS. As indicated in Figure 6-10, after 876,000 sample hours of simulation (i.e. 100

hours), the coefficient of variation of the SIPS risk reaches within 1%, which is used as

the error tolerance in this case. Consequently, the SMCS stops and the expectations in

the risk of each SIPS design after 100 years’ simulation period are used for further study.

Figure 6-10: Coefficient of Variation in SIPS Risk with Simulation Hours

The expected annual risks induced by GRS with different tripping logic and

communication architectures are shown in Figure 6-11 and Table 6-1. It can be seen that

the scheme with full redundancy (i.e. Arch4) and vetoing tripping logic delivers the

optimal overall performance with an overall risk of 30376 $/year. More specifically, the

annual risk from scheme normal operation stays at approximately $9000 for all the


Page | 177

designs. Consequently, this is not the main factor affecting the decision making in SIPS

design and it is mainly affected by the operating frequency of the GRS. Risk from GRS

DBM is the main contribution to the total cost for most designs, since the impact of

GRS DBM (e.g. isolation of the entire wind farm, load shedding) is considerably larger

than that of the normal operation and SBM.

Different SIPS communication infrastructures also lead to significant variations in

annual risks from scheme DBM and SBM. Similar conclusions can be drawn in terms of

the selection of the sensor network architectures compared with the analytical analysis

in the previous chapter. The implementation of duplicated bay level process bus system

as compared to single (e.g. Arch2 versus Arch1) and duplicated PRP based LANs

versus single (e.g. Arch3 vs Arch1) can significantly improve the performance in

dependability, with Risk(DBM) decreasing from approximately 43000 $/year to 30500

$/year. However, the redundancy in the communication system also leads to increased

security risk. For example, implementing duplicated LAN (Arch3) on Arch1 (Voting)

will increase the annual Risk(SBM) from 16682 $/year to $ 20291 $/year. The use of

vetoing logic can effectively decrease the risk of SBM without significantly

compromising the performance in dependability. For example, the risk of SBM for

Arch4 (Vetoing) is 11826 $/year as compared to 24135 $/year for Arch 4 (Voting),

whilst the risk of DBM for Arch4 (Vetoing) is 18551 $/year as compared to 16638

$/year for Arch4 (Voting).

Figure 6-11: Annual Risks Induced by Different GRS Designs


Page | 178

Table 6-1: Risk Assessment Results for Different GRS Designs

Comm. Architecture GRS Risks ($/year)

Arch. Process Bus Station

Bus

Operation

Cost

Risk

(DBM)

Risk

(SBM)

Risk

(total)

Arch1 (voting) Single

Process Bus Single

Ring

LAN

8945 42865 16682 59547

Arch1 (vetoing) 8937 43872 12060 55932

Arch2 (voting) Duplicated

Process Bus

9041 29283 20526 49809

Arch2 (vetoing) 9022 31956 8217 40174

Arch3 (voting) Single

Process Bus PRP

Ring

LAN

9035 30103 20291 50394

Arch3 (vetoing) 9030 30824 15669 46493

Arch4 (voting) Duplicated

Process Bus

9130 16638 24135 40773

Arch4 (vetoing) 9117 18551 11826 30376

6.6. Comparison between Local GRS and System Wide GRS

Based on the scheme communication architecture, SIPS application can be classified

into local SIPS and system wide SIPS. Most of the existing SIPS are local [17]. This

means the sensing, decision making and control devices are all allocated within the

same substation. The standalone nature of the local SIPS makes it difficult to achieved

coordination between different SIPS and may lead to an extensive maintenance effort if

the number of SIPS being implemented in the system increases.

Risks introduced by a system-wide centralised GRS (C-GRS) as shown in this studied

system is compared with a local GRS in this section. The aim is to evaluate whether a

system wide GRS could provide an enhanced or reduced reliability performance

compared with the local GRS. In Figure 6-12, the risks of the optimal local GRS design

(Arch4, voting) is compared with the optimal C-GRS design (Arch4, vetoing). The

failure rate of the WAN is varied over a wide range to observe its impact on scheme

risks. The local GRS is activated by the line outage information collected from the line

outage detection system monitoring the status of CB1 and CB2 at substation B13. With

all the sensing, decision making and implementation devices of a local GRS installed in

a single substation, its risks as represented in the dash lines, are not affected by the

variation in the WAN’s reliability. However, Risk(DBM) and Risk(SBM) of the C-GRS

both increase significantly with the growth in WAN failure rate. For example, if we use

the assumed failure rate of WAN as a base value, the risk of a C-GRS is lower than the

risk of a local GRS when the failure rate of WAN is less than 2.8 times of the base


Page | 179

value. In addition, the security risks of the C-GRS is higher than the local GRS when

the failure rate of the WAN is higher than 0.59 of the base value. However, a better

overall risk is achieved under most conditions by the C-GRS because of its enhanced

performance in terms of dependability.

The assessment results indicate that a centralised SIPS could achieve equal or better

performance as compared with the local schemes given two preconditions: firstly, the

reliability of the WAN should be high since it directly affects the performance of the C-

SIPS. Secondly, with more sources of information, more tripping options can be

designed for the C-SIPS. Therefore, suitable tripping logic is needed to balance

dependability with security to achieve optimal overall performance. This result should

encourage utilities to centralise the existing standalone SIPS to achieve enhanced

performance and SIPS coordination.

Figure 6-12: Comparison between a Local GRS and a System-Wide GRS

6.7. Impact of Variation in Wind Level on Risk Assessment Results

As illustrated in Chapter 5, the impact of uncertainty in the reliability data used in the

risk evaluation results can be effectively assessed via sensitivity study. The reliability of

the components, the frequency of the triggering event and the system conditions are

considered as main factors affecting the performance of the protection scheme. For a

GRS implemented at a wind farm, the output from a wind farm is a factor which affects

its frequency of operation and operational risks. The wind level at one location varies

significantly throughout a year. Figure 6-13 illustrates the variation in monthly average

wind speed across 100 years based on the wind data predicted by ARMA model.


Page | 180

In UK, the wind level peaks at winter time, whilst the minimum wind level is in

summer. In particular, the monthly wind speed in January is the highest across a year,

averaging 11.45 m/sec. The highest average monthly output from the wind farm reaches

58.7% of its capacity. Nevertheless, the wind speed averages the lowest during June,

with an average of 7.41 m/sec. The lowest monthly wind output observed in June is

15.3% of the total capacity of the wind farm. Consequently, the monthly power output

from a wind farm observed over 100 years simulation period varies from 15.3% to 58.7%

of its capacity, with an average output being 32.1%. Although most of the time the

output is close to the average value, the risk of GRS at extreme conditions need to be

examined.

Figure 6-13: Monthly Average Wind Speed Variation over 100 years

The variation in the monthly average risk of a GRS under each wind scenario (i.e. Low,

average and high) is illustrated in Figure 6-14. It can be seen the operational costs

associated with both normal GRS operation and GRS maloperations (i.e. DBM and

SBM) sources increase with the wind output. The operational risk is expected to

increase significantly during the winter time, with a high wind level and load demand.

In addition, in this case, the selection of the optimal GRS design is not affected by the

variation in wind level. The risk induced by Arch4 (vetoing) remains the lowest as

compared with other designs.


Page | 181

Figure 6-14: GRS Risks under Various Average Monthly Wind Levels

6.8. Summary

This chapter first provides an overview of the future GB energy landscape by reviewing

the latest documentation from UK National Grid. The expected significant increase in

intermittent renewables makes the system less reliable, and in future operation will rely

more on SIPS. To understand the implication of this decision, a more dynamic and

accurate SIPS risk assessment method is required.

Knowing the limitation of the analytical risk assessment method introduced in Chapter

5, a method based on Sequential Monte Carlo Simulation (SMCS) is proposed in this

chapter and applied to the evaluation of the reliability of a Generator Rejection Scheme

implemented in a system with a high penetration of wind generation. An ARMA model

based wind output prediction model was developed to forecast the generation output of

the wind farms based on historical wind speed data. The time-series generation and

demand models are then mapped into the risk assessment procedure and used to assess

the performance of GRS under various system conditions.

The study helps determine the optimal arming criteria by comparing the system risk

with and without SIPS implementation. The risks of GRS with different communication

architectures are assessed to determine the optimal design. The design in both process

bus communication system, bay sensor network and station bus communication system

will affect the performance of the GRS and affect the trade-off between scheme

dependability and security. The required trials for the SMCS based method can be


Page | 182

determined by monitoring the variation in SIPS risks. The fluctuations in the generation

output from a wind farm leads to a large variation in SIPS operational risks. It affects

the costs associated with normal operation, DBM as well as SBM. Therefore, a precise

wind level prediction and a dynamic SIPS risk assessment method are critical to

effectively forecast and manage the system operational risk.

The operational performance of a system wide GRS is compared with a local GRS.

Enhanced performance can be provided by the system wide GRS given a relatively high

reliability of the wide area communication system. The access to wide area information

could significantly assist the estimation of system condition and bring more flexibility

in the SIPS logic processing design. The increased probability of SBM brought by the

increased detecting systems implemented for a GRS could be effectively controlled by

using a ‘vetoing’ tripping logic to validate the scheme operation. The introduction of

wide area communication network into the system could also facilitate condition

monitoring and the coordination of the protection schemes, especially in a system with

high level wind penetration. This will be further illustrated in Chapter 7.

In the near future, significant amounts of renewable generation will need to be

integrated into the system, which means the system condition will become more

difficult to predict. The variations in system condition also make it difficult for one

protection scheme logic to be suitable for all the system operational conditions.

Therefore, an adaptive SIPS, which changes its operational logic, based on wide area

real-time data, needs to be designed to better manage the risk brought by a SIPS

implementation.

Page | 183

Chapter 7

MANAGING THE RISK OF SIPS IN

POWER SYSTEM LONG-TERM PLANNING

7.1. Introduction of electric system planning with SIPS

An increasing number of SIPS are now being implemented at different locations on a

power network and are being used for various control actions. In particular, as the

exploitation of the wind power has expanded quickly in the period 2000-2016 due to

improved and less expensive wind turbine technology, increased fossil fuel prices,

government subsidies and other policy incentives; greater use of SIPS has become

increasingly attractive in long-term system planning. This is mainly because of its

relatively low cost compared with transmission expansion. Consequently, the

widespread proliferation of SIPS has resulted in increased operational complexity and a

higher probability of unintended SIPS interactions. This significantly increases the risks

to the Power System brought by SIPS.

This chapter focuses on assessing the operational risk of using SIPS, when considering

the challenges SIPS may face in the long-term future. The possible unintended SIPS

interactions, caused by the growth in the number of SIPS are investigated. Meanwhile, a

Chapter 7: Managing the Risk of SIPS in Power System Long-Term Planning

Page | 184

computational model to address the problem of a SIPS-aided transmission expansion

plan is proposed.

7.1.1. Electric system long-term planning with SIPS

Power System planning involves the systematic assembly and analysis of the new

facilities and equipment (e.g. generation, transmission, distribution) needed to replace

worn-out systems and ones required to satisfy changing electricity demand. Planning

methodologies have been developed for energy supply, transport, and demand that

present the information to decision-makers to choose the appropriate course of actions.

An improved transmission network that enables more efficient power transfer within or

between regions will be described. The benefits of building long-term transmission

facilities include:

Greater access to low cost power.

Reducing the transmission congestion costs.

Reducing line losses, observed as heat, during electricity transmission.

Improving the electricity import and export ability of a region and new built

flexibility in absorbing new resource allocations.

Composite Power System expansion planning is usually developed by a combination of

reliability and economic justifications. Traditional transmission expansion is reliability

driven and this needs to include the generation, transmission adequacy assessment

which should account for the uncertainties in generation, transmission network and load.

The economic factors that need to be considered in transmission planning include:

production cost, investment cost, congestion charges and system interruption cost. In

addition, in the long-term system planning, inflation also needs to be considered as the

cost of materials for building transmission lines and SIPS increases with time.

Due to the rapid growth in wind generation and a relatively slow transmission

expansion rate, utilities may need protection schemes like SIPS to trip non-priority

generations to alleviate the congestion and allow greater access to low-cost power. Thus,

SIPS used for generator rejection could postpone the need for a new transmission

facility and may affect the transmission expansion planning. Nevertheless, as described

in the previous chapters, introduction of SIPS also brings additional risks which need to

be assessed. Therefore, the impact of SIPS on transmission planning decisions and the


Page | 185

induced additional risks needs to be predicted, assessed and considered in the planning

framework.

7.1.2. Challenges in SIPS Coordination

A main contributing factor to the challenges in SIPS coordination is due to the high

penetration of SIPS in Power System. This significantly increases the complexity of

system operation and may lead to a higher probability of undesired interactions between

the SIPS. The maloperation of individual SIPS (i.e. Dependability-based Maloperation

or Security based Maloperation) is another factor which may contribute to the spreading

of an electrical disturbance and eventually trigger the operation of other SIPS. Both the

Irish incident on 5th August 2005 [10] and the Nordic event on 1st December 2005 [11]

as mentioned in Chapter 2 are caused by the interaction between neighbouring and

overlapping schemes.

In addition, limitations in the extent of system studies or incomplete studies that do not

fully analyse the inter-relationship between a newly implemented SIPS and the existing

schemes may also lead to a SIPS interaction during normal operation. For example, an

unwanted operation of a generator rejection scheme will cause a reduction in system

frequency. This might accidentally trigger under-frequency load shedding.

Consequently, system studies need to be performed to ensure that the frequency dip

caused by the GRS would not trigger the scheme designed for load shedding to avoid

the undesirable interaction. Although SIPS are highly reliable, the rapid growth in their

use and the catastrophic impact of a maloperation highlighted the need to include the

risks associated with SIPS interactions in the SIPS risk assessment procedure.

A brief discussion of the increased operational complexity brought by the high

penetration of SIPS is provided by McCalley in [96]. The conceptual relationship

between the number of SIPS and system operational risks is illustrated as shown in

Figure 7-1. The impact of different strategies (i.e. SIPS, Transmission Upgrade) on the

risk in system operation is compared. With the increase in load level as indicated in

dotted lines, the system becomes more stressed. Consequently, with no actions taken,

the system operational risk reaches the limit when the demand is beyond 80 GW at

point A. The implementation of SIPS could effectively reduce the operational risk,

which could be controlled within in the acceptable risk limit for loading levels within


Page | 186

100 GW (i.e. point B). However, without transmission upgrade, the system operational

risk would eventually reach the acceptable limit due to increased risk associated with

SIPS maloperations and undesirable interactions. By transmission upgrading, the risks

can be controlled within the limit even when the load level is beyond 100 GW.

However, it is also an expensive solution and may not be practical in many cases.

Therefore, by combining the SIPS and transmission upgrading, a secure and cost-

effective approach can be carried out as shown in the red line. This helps avoid the

continuously increasing system operational complexity due to SIPS application. In

addition, the build of new transmission facilities can be postponed. Although Figure 7-1

clearly reflects the conceptual relationship between system operational risk and various

system expansion strategies, a quantitative study taking account the impact of SIPS

interactions and generation expansion plan is required. This is proposed in this chapter.

Figure 7-1: Conceptual Relationship between SIPS Number and System Operational

Risks [96]

SIPS operational complexity was recommended to be considered in SIPS reliability

assessment [97]. The study illustrated how to use multi-stage tree approach and

analytical process to enumerate the transmission planning options and minimizing

operational complexity. However, a quantitative method to evaluate risk caused by SIPS

interactions was not provided. As required in [14], SIPS should be designed to have

dedicated protection relays and communication system to prevent interactions with


Page | 187

other system. However, with the increase in number of SIPS and the centralised control,

interactions caused by common mode failures become inevitable. As reviewed in

Chapter 5, most of the existing SIPS risk assessment method focused on the risk

assessment of a single SIPS. These become insufficient to assess the risk of system with

multiple SIPS. Hence, in this chapter, a method based on multi-level Markov Modelling

is proposed to consider all the possible interaction states between SIPS in neighbouring

system. This helps identify the worst system condition which could be caused by SIPS

cascading failures and help adjust SIPS logics to achieve optimal coordination between

different SIPS.

7.2. Risk Assessment Methodologies Considering SIPS Interaction

A brief risk assessment procedure in a SIPS-rich system is illustrated in Figure 7-2.

Compared with the original process, few modifications have to be included in the risk

assessment procedures: First, a system study is carried out to identify all the SIPS in the

system and their operational logic, communication infrastructures and possible failure

modes. Next, the reliability assessment is modified by including a system-level Markov

Model to determine the probability and frequency of being in various SIPS interaction

scenarios. Finally, the impact assessment has to consider the impact of SIPS cascading

events, which may lead to a severe level of overall system risk.

Identify all the SIPS in system and their operational logic

Component-level & System-level Markov Model

Identify SIPS interaction scenarios

Predict Operating Conditions (Wind Generation & Load)

Sequential Monte Carlo Simulation

Impact and Risk Assessment

System Upgrade/Generation Integration

Figure 7-2: Risk Assessment Procedure Considering SIPS Interactions


Page | 188

7.2.1. Description of the System-level Multi-state Markov Model

The reliability assessment process is first modified to include a system-level Markov

Model, designed to effectively determine all the possible interactions between the SIPS

and assess the transitions between these states. Once the individual failure modes of

each SIPS are obtained using component-level reliability assessment as described in

Section 5.4.2., the interactions between SIPS are assessed using a system-level Markov

Model. A single SIPS can be in three operational states (i.e. Normal state, DBM and

SBM), which means the total number of operational states in the system will be 3N,

where N is the number of SIPS in the system.

1

SIPS-1: N

SIPS-2: N

2

SIPS-1: D

SIPS-2: N

3

SIPS-1: N

SIPS-2: D

4

SIPS-1: N

SIPS-2: S

5

SIPS-1: S

SIPS-2: N

6

SIPS-1: D

SIPS-2: D

7

SIPS-1: D

SIPS-2: S

8

SIPS-1: S

SIPS-2: D

9

SIPS-1: S

SIPS-2: S

λ1d

μ1d

λ2d

μ2d μ2s

λ2s

μ1s

λ1s

λ1sλ2s

λ1d

λ2dλ1d λ1s

μ1d μ2d μ1d μ2d

λ2d

μ1s μ1s

λ2s

μ2s μ2s

Level 0

Level 1

Level 2λcd λcs

μcs μcd

Figure 7-3: System-level Markov Model to Assess Interaction between Two SIPS

A system-level Markov Model used to assess a system consisting of two schemes (i.e.

SIPS-1 and SIPS-2) is shown in Figure 7-3. A total number of 9 operational states (32=9)

are considered in the model. State 1 (Level-0) is the most common and ideal operational

state, since both SIPSs are in the normal operating mode (N) and contingencies at

different locations in the system can be effectively mitigated by the corresponding SIPS.

States 2-5 (Level 1) are the most common cases for SIPS maloperation, with only one

scheme maloperated while the other is in the normal operational mode. The level-1

states do not usually result in any cascading failures because the system is normally

designed to withstand the outage of any system component and the other SIPS can be

used as backup protection to prevent the spreading of the failure. However, under severe

situations, the initial scheme failure may change the power flow and generation output.

In which case, the operation of the other scheme may be inappropriate under the new


Page | 189

system conditions. States 6 to 9 (Level-2) are the states with the most severe

consequences on system operation, with both SIPS in failure state (either DBM or

SBM). Maloperation of one scheme may lead to the disturbance of other parts of the

system and may require the operation of the other scheme. If the other SIPS is also in its

failure mode, serious cascading consequences could be caused by Level 2 operational

states.

The failure rates (λd, λs, λcd) and the repair (μd, μs, μcs) rates in the Markov model are

driving the transitions from one state to another. Although a SIPS needs to be designed

independent of the other schemes, common mode failures may still exist between

different SIPS due to inadequate design or due to a high level of centralisation in the

SIPS communication system. The common mode failures (λcd, λcs), which lead to

simultaneous failure of both schemes, can directly cause transition from the Level 0

state to a Level 2 state. For example, a dependability failure of the WAN could cause

DBM for both SIPS in the system. It is assumed that a common mode component

failure will have the same impact on both SIPS. Therefore, the transition between State

1 to State 7 or State 8 is neglected in the Markov Model.

To use a simplified system-level Markov Model and to represent the operational states

in a system with multiple schemes (more than two), it is assumed all the SIPS have

identical failure and repair rates (i.e. λ1d=λ2d, μ1d=μ2d). This allows the 9-state Markov

Model to be simplified into a 6-state Markov Model as depicted in Figure 7-4.

S1

All SIPS: N

S2

1 SIPS: D

S3

1 SIPS: S

S4

2 SIPS: D

S5

1 SIPS: D

1 SIPS: S

S6

2 SIPS: S

M31 M21

ᴧ12ᴧ13

ᴧ24

M42

ᴧ36

M63

ᴧ35ᴧ25

M53 M52

Level 0

Level 1

Level 2

ᴧ14 ᴧ16

M41 M61

Figure 7-4: Simplified System-level Markov Model for a System with Multiple SIPS

The stochastic transitional probability matrix of the simplified 6-state Markov Model is

given:


Page | 190

12 13 14 16 13 1612 14

21 24 25 21 24 25

31 35 36 31 3635

41 42 41 42

5352 52 53

61 63 61 63

1 ( ) 0

0 01 ( )

1 ( )0 0

0 01 ( ) 0

0 00 1 ( )

1 ( )0 0 0

Pr

M M

M M

M M M M

MM M M

M M M M

(7-1)

Assuming there are N SIPSs in the system, the equivalent transition rate of the

simplified Markov Model can be approximated by summing the failure rates and

averaging the repair rates which lead to the new operational states:

Failure rates: Repair rates:

12 dN 21 dM

13 sN 31 sM

2

14 N cdC 41 cdM

2

16 N csC 61 csM

(7-2)

24 ( 1) dN 42 2 dM

25 ( 1) sN 52 sM

35 ( 1) dN 53 dM

36 ( 1) sN 63 2 sM

The probabilities of being in each operational state in the Markov Model after m time

intervals Pr(m) and the frequency of encountering each state ( )f s can be calculated as

follows:

( ) (0)m mPr Pr Pr (7-3)

( ) ( ) ( ) ( ) ( )d ef S Pr S S Pr S S (7-4)

12 13 14 16 21 31

41 61

( 1) ( 1) ( ) ( 1) ( 3)

( 4) ( 6)

f S Pr S Pr S M Pr S M

Pr S M Pr S M

(7-5)

24 25 21 12 42 52( 2) ( 2) ( ) ( 1) ( 4) ( 5)f S Pr S M Pr S Pr S M Pr S M (7-6)

35 36 31 13 53 63( 3) ( 3) ( ) ( 1) ( 5) ( 6)f S Pr S M Pr S Pr S M Pr S M (7-7)

41 42 14 24( 4) ( 4) ( ) ( 1) ( 2)f S Pr S M M Pr S Pr S (7-8)

52 53 25 35( 5) ( 5) ( ) ( 2) ( 3)f S Pr S M M Pr S Pr S (7-9)

61 63 16 36( 6) ( 6) ( ) ( 1) ( 3)f S Pr S M M Pr S Pr S (7-10)


Page | 191

where ( )Pr S and Pr( S )are the probabilities of being and not being in the operational

state S. ( )d S represents the rate of departure from the state S and ( )e S represents the

rate of encounter the state S. The reliability assessment results are then integrated in the

risk assessment procedure to evaluate the overall operational risks including SIPS

maloperation and undesirable interaction.

7.2.2. Modified Impact Assessment Procedure

The modified impact assessment is performed in two steps. First, the impact of each

SIPS operation after a triggering event is evaluated. The changes in power flow and

generation output brought by the SIPS’s remedial actions or maloperations are updated

in the testing system. Next, the impact of the initial SIPS operation or maloperation on

the other SIPS in the system is investigated. A system study is carried out to investigate

whether the initial scheme would trigger the operation of other schemes in the system,

and therefore cause an additional impact on system reliability.

Due to the complexity in the SIPS operational conditions, the Sequential Monte Carlo

Simulation (SMCS) is used to assess the impact of each SIPS operational state under

various system conditions. A set of different system operating conditions are predicted

based on both historical data and the simulation models. Extreme scenarios such as

severe weather conditions or high load demand have to be included and emphasized

since it may lead to cascade SIPS operation. A dynamic annual hourly load profile

based on IEEE-RTS load model and wind farm output profiles created using the ARMA

model are integrated into SMCS in order to reflect the random behaviour of the system.

The impact of each of the 9 operational states is evaluated using various impact indices

and can be expressed as:

( , ) ( ) ( ) ( , ) ( , )i i i i

i i L i G i R i i S i iIm g d C d C g C g d C g d (7-11)

where ( , )i iIm g d computes the impact of different interactions as a function of

generation output and load demand, and the Parameters ( )i

L iC d , ( )i

G iC g , ( , )i

R i iC g d and

( , )i

S i iC g d are the cost associated with load shedding, wind curtailment, generating

capacity redispatch and restart of tripped generators or wind farms respectively.


Page | 192

The risk introduced to the system is calculated as the product of the probability or

frequency of each SIPS interaction state and its impact on the system. The risk of DBM

related operational states ( )DBMRisk S and the risks of a SBM states ( )SBMRisk S can be

calculated as:

1( ) ( ( ) ( ) ( , ) ( ) ( ))

N

DBM i i i i iiRisk S Pr E Pr S Im g d Pr g Pr d

(7-12)

1( ) ( ( ) ( , ) ( ) ( ))

N

SBM i i i iiRisk S Fr S Im g d Pr g Pr d

(7-13)

where N is the number of samples within the period of simulation. Parameters ( )iPr E ,

( )Pr S , ( )iPr g and ( )iPr d represent the probabilities of the initiating event Ei, the

state S, the generating output gi and the load demand di. Fr(S) is the frequency of

encountering the SBM states.

7.3. Method Numerical Illustration

To evaluate the effectiveness of the risk assessment method, the PJM 5-bus system [98]

was used to illustrate the impact of SIPS maloperations and undesirable interactions on

system integrity. All the transmission lines in the system are assumed to have an

identical thermal rating of 400 MVA. The cost and MW limit of each generation are

illustrated in Figure 7-5. Two wind farms were initially integrated at bus B1 and B5,

with an installed capacity of 100 MW each. Due to the low cost, the use of wind

generation is given exclusive priority in the system. The power is transferred from the

generation centre with a relatively low cost to the load centre. However, the installation

of the wind farms stressed the connecting transmission lines, especially when the wind

farm outputs or the load levels are high. L1, L2 and L6 are the three heavily loaded lines

connecting the main generations to the load.

The ‘N-1’ criterion may not be satisfied during stressed system conditions. When there

is a permanent fault on these critical circuits, the associated protection device will trip

the circuit breaker to clear the fault on the line. An outage on any one of the three

critical lines could lead to power flow on the other two lines exceeding 400MVA and

therefore initiate cascading trips, which would isolate the generation centre (B1 and B5)

from the load centre (B2, B3 and B4). This may eventually lead to a higher generation


Page | 193

cost by the use of G2 and G3. When load demand is high, disconnection of customers at

B3, B3 and B4 might be required. However, instead of the upgrading of the

transmission network (i.e. a new line), the SIPS could efficiently maximize the power

transfer to the load centre whilst maintaining system reliability.

300MW

300MW

300MW

400MW15 $/MWh

300MW

B1 B2

B3

B4B5

WF1: 100MW

WF2: 100MW

400MW30 $/MWh

S

L1

L2L3

L4

L6

L5

Generation Center Load Center

G4

G1G2

200MW40$/MWh

G3

13 $/MWh

Figure 7-5: Modified PJM 5-bus System with Wind Farms

SIPS-1 and SIPS-2 could be implemented at WF1 and WF2 respectively to enhance the

system integrity. For high wind speed at WF1 and consequently high wind output,

SIPS-1 is armed to continuously monitor the status of L1 and L2. Under stressed system

conditions, when there is an outage on either of the two lines, the other line will be

overloaded and there is a possibility of cascade tripping. In this case, SIPS-1 is designed

to disconnect the WF1 from the system to relieve the overloaded lines and prevent

cascade tripping and the isolation of all the generation plants at B1. When the output of

WF1 is low and the output at WF2 is high, the operation of SIPS-1 may not be

sufficient to relieve the overloading on L1 or L2. Consequently, the operation of SIPS-2

is required to disconnect WF2 from service as a backup protection. The initiating event

for SIPS-2 is the outage of L3 or L6. Following the outage of L6, and if we assume a

high load demand and a low wind speed at WF2; then L1 and L2 could be heavily

loaded, even after the operation of SIPS-2. Therefore, the operation of SIPS-1 may be

required following the operation of SIPS-2. In this system, the operation of one SIPS


Page | 194

affects the other schemes, therefore the interaction scenarios need to be studied. A case

study, involving different SIPS designs, was used to evaluate the performance of a

SIPS-rich system.

7.3.1. Reliability Assessment Results

Knowing the operational logic of the SIPS in the studied system, the individual

operational mode and possible interconnections between the schemes can be identified.

With each scheme having 3 operational states (i.e. N, DBM and SBM), a total number

of 9 (32) states are considered. If we assume both schemes have the same design, and

consequently the same failure and repair rates, the simplified 6-State Markov Model is

used for reliability assessment.

Based on the previously discussed reliability assessment results in Chapter 3, the mean

time to failure (MTTF) for the circuit breakers, the process bus and merging units, the

IEDs, the LAN and the WAN are 100, 59.9, 100, 63.8 and 50 years respectively. The

probabilities of different SIPS states and interactions are assessed using the Markov

Modelling described in Equations (7-3) - (7-10). The probability of being in each

“failure to operate” state and the frequency of encountering a spurious operation state in

the next hour for each SIPS design are estimated and recorded in Appendix D.

Table 7-1 compares the reliability of implementing the SIPS with different level of

redundancy and tripping logic in the studied system. The probabilities of being in the

DBM related states (i.e. S2, S4 and S5) can be reduced by providing redundancy in the

communication system. For example, by duplicating the substation process bus and

station bus system, the probability of being in S4 for the scheme using a voting logic is

reduced from 7.98×10-4 to 5.01×10-4. Moreover, using the logic solver to validate the

decisions made by the redundant line-outage detection systems prior to issuing a trip

decision will lead to a less frequent entry to the SBM states. This optimization in SBM

is more obvious in a SIPS design with a higher level of redundancy (e.g. Arch4).

However, it also inevitably leads to increased probability of being in State S2 (D&N),

S4 (D&D) and S5 (D&S). For example, when the vetoing logic is used instead of the

voting logic, the probability of Arch4 being in state S2 (D&N) increases from 5.39×10-3

to 4.60×10-2.


Page | 195

Table 7-1: Probability of each Operational State in a System with Two SIPS

Sys States

Arch1

(voting)

Arch1

(vetoing)

Arch4

(voting)

Arch4

(vetoing)

Single process/station bus Dup. process/station bus

Normal (Level 0)

S1 N&N Pr 9.68×10-1 9.54×10-1 9.94×10-1 9.53×10-1

1 Maloperation (Level 1)

S2 D&N Pr 3.0×10-2 4.52×10-2 5.39×10-3 4.60×10-2

S3 S&N Fr 1.75×10-6 1.38×10-5 2.39×10-5 1.36×10-5

2 Maloperations (Level 2)

S4 D&D Pr 7.98×10-4 9.87×10-4 5.01×10-4 1.00×10-3

S5 D&S Pr 5.94×10-6 7.03×10-6 1.40×10-6 7.06×10-6

S6 S&S Fr 4.46×10-7 4.38×10-7 4.60×10-7 4.37×10-7

Although the probability of two schemes simultaneously being in the failure state (i.e.

S4, S5 and S6) is much lower compared with other states, the probability of unintended

interaction between SIPS may increase dramatically with the number of schemes in the

system. As per Equation (7-3), the failure rates in the Markov Model are proportional to

the number of SIPS in the system (i.e. N). The variation in the probability of

interactions with the number of SIPS in the system for SIPS design Arch4 (Voting) is

given in Table 7-2. Although the probability of scheme interaction is small in the

studied system with two schemes, it may increase dramatically as the number of the

schemes in the system increases. For example, the probability of being in the State 4

(D&D) i.e. two dependability-based maloperations, increases from 0.05% to 2.1% as

the number of SIPS in system increases from 2 to 10. Therefore, it is of great necessity

to evaluate the variation in SIPS risk as the number of SIPS increases.

Table 7-2: Variation in the Probability of Interactions between SIPS for Arch4(voting)

No. of SIPS S4 D&D S5 D&S S6 S&S

Probability Probability Frequency

2 5.01×10-4 1.40×10-6 4.60×10-7

3 1.50×10-3 4.96×10-6 1.37×10-6

4 2.98×10-3 1.14×10-5 2.73×10-6

5 4.93×10-3 2.13×10-5 4.51×10-6

6 7.34×10-3 3.53×10-5 6.70×10-6

7 1.02×10-2 5.41×10-5 9.27×10-6

8 1.34×10-2 7.80×10-5 1.22×10-5

9 1.71×10-2 1.07×10-4 1.55×10-5

10 2.10×10-2 1.43×10-4 1.91×10-5


Page | 196

7.3.2. Impact Assessment Results

The impact of the previously discussed 6 different SIPS operational states in the PJM 5-

bus system is investigated under various system conditions. To illustrate the variation in

SIPS impact under various system conditions, numerous case scenarios are generated by

changing the load level and the generation output of each wind farm. Specifically, the

wind levels at each wind farm could be at a low level (0 MW), a medium level (50 MW)

and a high level (100MW). Whilst, the load level at 100% and 80% of the peak load are

illustrated.

All the possible consequences caused by SIPS operation and the financial impacts [99]

are listed in Table 7-3. In this study, it is assumed the restoration of a wind farm after a

SIPS operation takes 2 hours. In addition, load shedding caused by cascading failures

takes 5 hours to recovery. The impact of each SIPS state at a particular system condition

( , )i ig d can be calculated as:

( , ) ( ) ( ) ( , ) ( , )

( ) 18000 ( ) 120 ( ( ) ( )) 50

i i i i

i i L i G i R i i S i i

GS

Im g d C d C g C g d C g d

VOLL i t LOWG i t LOWG i VOLL i t C

(7-13)

Where VOLL represents the value of lost load, LOWG is the loss of wind generation.

GSC is the cost of generator start-up cost. t represents the duration of the impact on

system.

Table 7-3: Impact Assessment Data of Different SIPS Operation [99]

State Cost Items Duration(hrs) $/MWh(Case)

SIPS

Operation/SBM

Wind Curtailment 2 120 $/Case

Wind Farm re-start - 10000 $/Case

Re-dispatch 2 50 $/MWh

SIPS DBM

Load shedding 5 18,000 $/MWh

Generator start-up - 5000 $/Case

Energy Re-dispatch 5 50 $/MWh

The impacts of the studied SIPS operational states under particular system conditions

are illustrated in Figure 7-6; the DBM related states are in Figure7-6 (a), whilst the

normal operation and SBM related states are in Figure7-6 (b). The impacts caused by


Page | 197

DBM related states are considerably larger than the normal operation and SBM related

states. This is caused by the considerably higher severity of the DBM consequences (i.e.

isolation of entire generation plant and load shedding).

The impact of DBM related operational states increases with the load level and the wind

farm output. This is because when the load and wind output are high, there is a higher

probability of cascade tripping following an initial SIPS DBM and the more severe

financial consequences (e.g. higher VOLL). Under most system operating conditions, a

single scheme DBM (S2) has a limited impact on the system as compared with a Level

2 DBM related states (i.e. S4, S5). The worst case scenario is brought by S5

(DBM&SBM) when the system is stressed with heavy loading and high wind output.

This indicates that, the SBM of one scheme could lead to rescheduling of generation

and changes to the power flow, which may stress other parts of the system. If this is

followed by the DBM of the other scheme, the result is cascade line outages and serious

economic impact. When operating as designed (i.e. S1), the operational cost reach the

highest level when both the wind and load levels are high, since in this case both wind

farms have to be disconnected to ensure stability after a critical line outage. The impact

of SBM is only related to the wind power output of the two wind farms at the time of

scheme maloperation. The economic impact in this case is from wind curtailment,

energy redispatch and start-up of the tripped generators. From the impact assessment

results, it can be seen that the SIPS operational states associated with unintended

interactions could have a wider influence on the system and may lead to greater

economic impact as compared with individual SIPS maloperation.


Page | 198

(a)

(b)

Figure 7-6: Impact Assessment Results under Various System Conditions: (a) Impcat of

DBM related States. (b) Impact of SBM related States.

7.3.3. Risk Assessment Results

The risks introduced by implementing SIPS with different tripping logic and

communication architectures are illustrated and compared in this section. With the

application of the WAMPAC system, measurements from distributed substations can be

centralised and used by the centre controller for SIPS decision making. Line-outage

information from both local and remote substations, especially which related to critical

lines, can be collected for centralised decision making to ensure enhanced dependability.

The performance of SIPS communication architectures introduced in Figure 5-2 is

evaluated in the 5-bus system with two schemes. Figure 7-7 shows the risks of different

SIPS designs, with the contribution of each SIPS operational state represented using

different colours in each column. For the local SIPS shown in Figure 7-7, only local

measurements are used for SIPS decision making. The scheme design with fully

redundancy and a voting tripping logic (i.e. Arch4 (voting)) delivers the optimal overall

performance with a risk of 12690 $/year. The implementation of a redundant

communication network proved to be an effective way to enhance scheme dependability.

Whilst the use of vetoing logic in local SIPS leads to a higher overall risk due to

increased Risk of DBM. For example, Arch4 (vetoing) has a DBM related risk of 20030

$/year as compared to a DBM relayed risk of 6967 $/year for Arch4 (voting). More


Page | 199

specifically, the cost of SIPS normal operation stays at approximately 2400 $/year for

all the communication architectures and tripping logics and is mainly determined by the

operational frequencies of the two SIPS in the system. Risks caused by the DBM of one

of the SIPS are the main contribution to the total risks for most designs in this numerical

study, because of the considerably larger impact of DBM as compared to normal

operation or SBM and higher probability as compared to SIPS interactions.

Implementing duplicated process bus communication system at substation bay level and

duplicated LANs in accordance with the parallel redundancy protocol (PRP) can

significantly enhance the performance in terms of dependability. Nevertheless, the

redundant communication system may also lead to higher risk in SBM and SBM related

interactions. For example, when a voting tripping logic is used, the introduction of the

duplicated PRP LANs (i.e. Arch4(voting)) in the substation automation system will

increase the Risk(SBM) from 6059 $/year to 6967 $/year.

Figure 7-7: Annual Risk Induced by Different Local SIPS Designs

When a system wide centralised SIPS is used, additional sources of line outage

information can be obtained from both local and remote substations. Consequently, the

level of redundancy in the scheme activation signal is relatively high. As shown in

Figure 7-8, the risks are mainly from SBM related operational states. Therefore, the use

of vetoing logic can effectively reduce the risk of SBM without significantly increasing

the risk of DBM. For example, by using the vetoing tripping logic in a system wide

SIPS with Arch2, the risk of SBM can be controlled at a relatively low value and the


Page | 200

optimal overall performance can be achieved. Meanwhile, when the redundancy level in

the SIPS local communication network is low (e.g. Arch1 without redundant process

bus and LANs in the SAS), the use of system-wide centralised SIPS could effectively

reduce its operational risks by enhancing its dependability performance.

Figure 7-8: Annual Risk Induced by Different System-Wide Centralised SIPS Designs

As shown in Table 7-2, the increased number of SIPS leads to a higher probabilities of

SIPS interactions. Therefore, it is necessary to estimate the trend in the risk associated

with SIPS operation as more SIPS are being implemented. To assess the operational risk

of a system with multiple SIPS, system studies are required to evaluate the

consequences of each SIPS on system operation. By assuming the impact assessment

results in Figure 7-6 to be the average impact of all the SIPS in the system, the

operational risk can be estimated by multiplying the probabilities of different

operational states with associated impact.

Figure 7-9 shows the variation in SIPS risk against the number of schemes implemented

in the system. It can be seen that the risks caused by all the four SIPS operational states

(i.e. Normal operation, DBM, SBM and Interaction) increase with the number of

schemes in the system. When there are more than 10 schemes in the system, risk of

SIPS interactions becomes the greatest contributor, taking up to 33.9% of the overall

risk. This indicates that although the risk from SIPS interaction may be small currently,

there is a probability of dramatic increase following a wide-spread implementation of

SIPS in the future.


Page | 201

Figure 7-9: Variation in Risks (Arch4 voting) with Number of SIPS in System

7.4. System Planning Incorporating SIPS

Although SIPS can provide a less expensive way to fulfil the reliability requirements of

a Power System, the risk assessment indicates that a high penetration of SIPS results in

increased risks due to the high probability of undesirable interconnections and increased

operational complexity. Additionally, new challenges are brought to the transmission

system due to the continuous integration intermittent renewable generation. Another

way to effectively alleviate congestion and to allow more wind and PV integration is

transmission network upgrading. However, due to the considerable cost required,

transmission upgrading is carried out in conjunction with SIPS to enable an effective

trade-off between operating and investment cost and system integrity. SIPS operation

under future generation and transmission upgrade scenarios is evaluated to determine

the optimal transmission and generation expansion plan.

The risk assessment method is now used to illustrate the variation in SIPS risk in a

planning horizon of 25 years, incorporating transmission upgrading, demand increase

and wind integration. It is assumed demand increases by 1% per annum at each load

point. In order to fulfil the ‘Gone Green Plan’ proposed by UK National Grid, the wind

capacity of both wind farms is increased from 100MW to 200MW in a step of 25 MW

every 5 years. This will ensure wind energy will supply 26.6% of the total energy at the

end of the planning horizon. The Local Marginal Price (LMP) [100] is introduced to

determine the candidate lines for transmission expansion. By comparing the LMP at

each bus, the candidate line is built to connect the bus with the lowest LMP with the


Page | 202

highest LMP. L1-3 (i.e. circuit connecting bus 1 and 3) is identified as candidate line

using this method. In addition, the introduction of the new transmission line will reduce

the times that SIPS is required to operate and allow greater power transfer from the

power plant with the lowest cost. Therefore, the annual production cost and the

frequency of wind curtailment after transmission expansion are examined.

300MW

300MW

300MW

400MW

300MW

B1 B2

B3

B4B5

WF1: 100MW

WF2: 100MW

400MW

L1

L2L3

L4

L6

L5

G4

G1G2

L1-3

200MW

G3

Figure 7-10: PJM 5-Bus System with Transmission Expansion

The variation in SIPS operational risks in the 5-bus system over the 25-year planning

horizon is shown in Figure7-11. It is assumed that a new transmission line L1-3 is built

in the 20th year. Due to the continuously increasing wind integration and load demand,

the operational risks of each of the four SIPS operational states continue to increase. If

transmission upgrading is not implemented, at the end of the 25-year period, the SIPS

risk will increases from 12690 $/year to 40770 $/year. The congestions in the system

will significantly impede the ability to integrate large scale renewable generation.

Building new transmission lines linking the wind rich areas to the load centre (e.g. L1-2,

L1-3) can effectively relieve the congestions and significantly reduce the risks

introduced by SIPS operation, effectively keeping overall risks within 25000 $/year.

This is achieved due to the reduced operation frequency of SIPS-1 and SIPS-2.

However, it leads to a noticeable increase in the risk of SBM, because the schemes in

the system have a new transmission line to monitor during a system disturbance.


Page | 203

Figure 7-11: Variation in SIPS risks in a planning horizon of 25 years

The impact of line expansion on system production cost and wind curtailment by SIPS

are shown in Table 7-4. Despite the considerably higher investment cost incurred by

transmission expansion, the total production cost can be reduced by transferring large

amounts of cheaper energy from the wind rich areas to the load centre. The reduction in

production cost also increases with the time as the wind generation and load demand

increase. For example at year 25, the introduction of L1-3 reduces production cost by

3.62×105 $/year. In addition, the wind curtailment due to SIPS operation also reduces

when either L1-2 or L1-3 is introduced. By integrating the SIPS risks into the

transmission expansion model, a SIPS-aided transmission expansion plan can be carried

out to minimize production and investment costs.

Table 7-4: System Production Cost and Wind Curtailment with Simulation Year

Production Cost

($/year)

Wind Curtailment

(MWh/year)

Year No L1-3 L1-3 (reduction) No L1-3 L1-3

1 7.12×107 -1.58×105 6.91 1.96

10 8.93×107 -2.22×105 18.89 7.10

20 1.13×108 -3.01×105 40.18 17.85

25 1.25×108 -3.62×105 49.99 24.54

7.5. Sensitivity Study

Sensitivity analysis is carried out to evaluate the impact of the high uncertainty in the

data used in the simulation on the risk assessment results. The reliability of the

components used in a scheme’s communication system is determined by its failure rate

(λ) and the repair rate (μ). The failure rate is expected to vary over the life cycle of a


Page | 204

device and also with different manufactures of the device. The frequency of inspection

and maintenance of the components, reflected as repair rate, also has a great impact on

scheme risks. In addition, as illustrated in the component-level Markov Model in

Section 5.4.2., the relationship between the failure rates of detectable DBM (λdd),

undetectable DBM (λud) and SBM (λst) complies with the equation:

: : : : 2:1: 2dd ud st . Different devices might have different self-monitoring

abilities, which will lead to significant variation in the probabilities of being in each

failure mode.

Figure 7-12: Impact of Variation in Reliability Data on SIPS Risks

The impact of these uncertainties on the simulation results are evaluated by performing

sensitivity study. As shown in Figure 7-12, using a more reliable device with less failure

rate can always lead to a better overall performance. Meanwhile, by increasing the

repair rate, the overall risk brought by the schemes can also be reduced. The repair rate

of a device’s undetectable DBM (µud) can be increased by more frequent scheme

inspection and maintenance. For the failures that can be detected by the self-monitoring

function (i.e. detectable DBM, SBM), a more timely replacement of the faulty device

could effectively reduce the operational risks. With an enhanced self-monitoring ability,

a higher percentage of DBM failures can be detected by the device (i.e. increased α). In

this case, the operational risk of the system can be significantly reduced. For example, if

the percentage of the detectable DBM increases from the original 40% to 60%, the total

annual risks induced would decrease from 12,662 $/year to 9,240 $/year. Consequently,

allocating more maintenance efforts on the critical components and the condition

monitoring of these devices could be effective in mitigating SIPS operational risk.


Page | 205

7.6. Managing the SIPS Risk Using Adaptive SIPS

As described in the sensitivity study, the increasingly variable operating conditions of

Power Systems significantly affect the SIPS performance. Therefore, it is difficult for a

single SIPS design to be suitable for all the weather and system conditions. The

introduction of WAMPAC offers an opportunity to improve the performance of SIPS

using more adaptive and intelligent protection logics. This allows the system operator to

shift the balance between system dependability and security according to the current

system conditions.

As shown in Figure 7-13, the information collected by the sensors located at each

substation and wind farm are centralised using the WAN. An adaptive SIPS is designed

to select the optimal operational logic for all the SIPS in the system based on the current

system conditions. The wind generation outputs of the wind farms and the load levels of

all load points are collected by the local sensors and then sent to the controller via the

WAN. The centralised controller is used to estimate the risk of different SIPS

operational logics using the risk assessment method proposed in this paper and choose

the most suitable operating logic to achieve the minimum operational risk and the

optimal SIPS coordination.

In this case study, the two GRS (i.e. SIPS-1 and SIPS-2) could adjust its tripping

algorithm to use either a “voting” or a “vetoing” logic. In addition, the GRS could also

decide whether to use additional activation signals from the remote substations (i.e. B2

and B3) to enhance the dependability of its operation.

Figure 7-14 shows the variation in the SIPS operational risks in the studied PJM 5-bus

system with varying system conditions during a winter month. The operational risk

when both GRS are local schemes with Arch4(voting) design is compared with the risk

induced by the adaptive SIPS. The adaptive SIPS offers a noticeable reduction in

operational risk compared with the conventional design. In addition, it also helps reduce

the probability of cascading failures when the system is heavily stressed with high wind

output and load demand. The operating logic of the adaptive SIPS is switched 121 times

in 720 hours, leading to an average time interval of 5.95 hours per switch.


Page | 206

Figure 7-13: Adaptive SIPS using WAMPC Platform


Page | 207

Figure 7-14: Variation in SIPS Risks under Different System Conditions

In particular, when the load level and the wind level of both wind farms are relatively

low (e.g. at time t1), the system has sufficient generation reserve and the transmission

facilities, which provide robust alternative path after the event of contingencies.

Therefore, SIPS DBM has limited impact on the system under the studied system

condition. The optimal performance at this situation is achieved when the vetoing logic

is applied to both SIPS-1 and SIPS-2, which could provide the optimal performance in

terms of system security. At t2, predicting the load level in the system increased to 81.9%

and the wind speed at WF1 is going to be high, SIPS-1 will adjust to the voting logic

and collect line-outage signals from both local and remote substations to maximize the

dependability. With SIPS-1 being highly dependable and with a low wind level at WF2,

consequences following a SIPS-2 DBM can be effectively mitigated by the operation of

SIPS-1, making it critical in maintaining scheme dependability. Nevertheless, the

vetoing logic is used by SIPS-2 to reduce the security risks without significant


Page | 208

compromise in system dependability. Consequently, the overall risk can be reduced

from 5.2 $/hour to 2.9 $/hour. At t3, with the increase in the wind level at WF2, SIPS-2

starts to use the remote signals to enhance the dependability. This follows a reduction in

operational risk from 1.4 $/hour to 0.61 $/hour.

Figure 7-15 illustrates the variations in SIPS operational risks in three typical days. The

changes in the operational logic for the adaptive SIPS for a 24 hours/1day period are

shown in Figure 7-16. It can be seen that during the night (i.e. 23:00-6:00), when the

load level is low, the consequences following a SIPS dependability-based maloperation

is relatively low. Consequently, the most secure operational logic, i.e. the “vetoing”

logic, is used by both SIPS in the system to ensure the highest security. During the

daytime, when the load level is relatively high, wide-area information is then used as

redundant activation signal to enhance scheme dependability. When the wind levels at

both wind farms are high (e.g. Scenario 1 and Scenario 2), the operational risks of the

system can be effectively controlled within a low level by shedding the wind farms

during contingencies.

Figure 7-15: Variations in SIPS Risks in Three Typical Days


Page | 209

The highest operational risk occurs during a low wind level and a high load demand

system condition (i.e. Scenario 3). The operation of the GRS on the wind farms has

limited impact on relieving system congestions. During the peak load period of the day

in Scenarios 3 (i.e. 17:00-19:00), SIPS-2 is switched to the most secure “Vetoing” logic

to prevent the spurious trip of the wind farm 2 (WF2). This is vitally important when the

load demand is high and the generation reserve is not sufficient. In this case, the trip of

WF2 may lead to increased generation output at the plant at B1 and eventually cause

overloading on the other part of the system. Load shedding schemes are required to

prevent system cascading failure by disconnecting some of the load.

It can be concluded that SIPS based on the predetermined operational logic may not

necessarily deliver the optimal operation. The hierarchically layered control actions and

the continuously varying system conditions require system operators to make control

decisions based on real-time data and a system-wide view. Therefore, the key to

achieving effective SIPS applications resides not only on the measurement IEDs and

communication infrastructures, but also on the fast computing and data processing

computers and analysis software tools that offer valid solutions for various system

conditions. The proposed risk assessment procedure offers an effective method to

ensure optimal SIPS performance and facilitates system operator in decision making

during severe system contingencies.

0:00 06:00 12:00 18:00 24:00

SIPS-1

SIPS-2

SIPS-1

SIPS-2

SIPS-1

SIPS-2

Vetoing

Vetoing

Vetoing

Vetoing

Vetoing

Vetoing

System-wide Voting

System-wide Vetoing

System-wide Voting

System-wide Vetoing

System-wide Voting

System-wide Vetoing Vetoing SW Vetoing

Vetoing

Vetoing

Vetoing

Vetoing

Vetoing

Vetoing

Scenario 1

Scenario 2

Scenario 3

+

Figure 7-16: Operational Logics of Adaptive SIPS during a Day for each Scenario

7.7. Summary

This chapter provides a procedure to assess the impact of undesirable interactions

between SIPS. The evaluation results indicate that SIPS maloperations and interactions


Page | 210

introduce additional risks to system. Different SIPS interactions scenarios in the PJM 5-

bus system with two GRS are studied. A SIPS-aided transmission expansion plan was

carried out to illustrate the impact of future energy integration and transmission

expansion on SIPS risks.

Unintended interactions between SIPS could result in cascade failures and lead to a

more severe impact compared with individual SIPS failure. In addition, the operating

risk exposure of SIPS, and especially risks caused by SIPS interaction, would increase

significantly with greater wind integration and as the number of SIPS increase. The

probability and severity of unintended SIPS interactions is highly related to the system

condition. Under stressed system conditions with a high load demand and generation

output, the cascading SIPS operation is more likely to occur. Unintended interactions

between SIPS could lead to a more severe impact compared with individual SIPS failure.

In the near future, system operators are facing more severe challenges in managing the

operational risk of SIPS. Therefore, SIPS within the context of Power System long-term

planning is considered. The continuously increasing wind penetration and load level

will increase the operational cost of SIPS. The build of new transmission circuits could

significantly reduce SIPS risk, decrease wind curtailment and allow more access to the

cheaper energy in the system. Therefore, a SIPS assisted transmission upgrading plan

helps maximize system reliable operation whilst minimizing he production and

investment costs.

A new type of SIPS with adaptive operational logic, which adjusts to the various system

conditions, is developed to manage SIPS-induced risks. Currently, the existing SIPS are

built based on predetermined seasonal and off-line mitigation actions. The adaptive

SIPS, made available by the modern IEDs and system wide monitoring system, allows

system operators to shift the balance between system dependability and security. When

the system is less stressful with relatively low load level and generation output, the risk

from SBM is the main source of SIPS operational risk. Consequently, more secure

protection logic can be implemented. On contrast, when the system is heavily stressed

or when the other scheme is in failure state, the successful operation of the SIPS is

vitally important in preventing system cascading failure and need to be highly

dependable. The adaptive protection not only helps reduce the risk of DBM and SBM of

individual SIPS, but also helps achieve better coordination between SIPS. The


Page | 211

significant variation in SIPS operational risk due to the fast changing system conditions

can be effectively mitigated.

The proposed methodology can help utilities understand the impact of advanced ICT on

SIPS reliability and quantify the continuously increasing probabilities of unintended

interactions among SIPS on the same or neighbouring systems. It also helps ensure

optimal SIPS performance and facilitates system operator in decision making during

severe system contingencies.

Page | 212

CHAPTER 8

CONCLUSIONS AND FUTURE WORK

8.1. Introduction

Continuously growing energy demand, the connection of bulk renewable generation and

the deregulation of the electric energy market have brought significant challenges to the

reliability of a Power System. System Integrity Protection Schemes (SIPS), fully

integrated with ICT, modern IEDs and advanced control algorithms, are now being

implemented by the system operators to minimize the probability of large system

disturbances and to fulfil the strict requirements for overall Power System reliability.

However, changes in the ICT infrastructure and ageing protection assets raise major

concerns about the reliability of the protection system and their impact on system

operation, especially during severe operating conditions or Power System contingencies.

The aim of this research is to provide an insight into the reliability of the protection

schemes installed in the transmission network. To achieve this, main causes of SIPS

maloperations and their impact on system operation are studied. Probabilistic based

reliability assessment models have been developed to quantitatively assess the risk of a

SIPS and to determine the optimal SIPS design and operational logic. Furthermore,

reliability enhancement methods and design considerations selected to improve SIPS

Chapter 8: Conclusions and Future Work

Page | 213

performance are discussed. The main conclusions drawn from this research and

suggestions for the future work are summarised in the following sections.

8.2. Conclusions

SIPS are a cost effective and easy to implement method of enhancing Power System

reliability and maximizing its transfer capacity. The use of SIPS becomes increasingly

attractive as the level of renewable generation increases drastically and societal policies

create a stagnant transmission upgrade policy. Although SIPS provide corrective control

actions for various abnormal system conditions and are designed to preserve system

integrity, failure to fulfil its reliability requirement will expose the Power System to

additional risks.

Major SIPS related system disturbances reveal the consequences of a SIPS maloperation.

In the pre-cascading phase, the effective and quick operation of SIPS is vital in

preventing the spreading of a disturbance. Incorrect, delayed or the failure of SIPS to

operate increases the probability of the system entering the cascading phase and may

eventually lead to severe consequences such as load disconnection. The surveys

conducted by the IEEE-CIGRE working group shows that SIPS may normally fail in

two ways: 1) Dependability-based maloperation (DBM), which is a failure to operate

when the SIPS is required. 2) Security-based maloperation (SBM), which means an

unwanted SIPS operation when there is no disturbance in the system. In addition, due to

the high penetration of SIPS in many Power Systems, the complexity of system

operation has significantly increased in recent years. This also leads to a higher

probability of undesired interactions between SIPS located on the same or neighbouring

systems.

A review of the system disturbance reports issued by NERC indicates the main causes

of SIPS related events are hardware or software failures, faulty design logic and human

errors. In addition, the majority of the recorded events were caused by SIPS security-

based maloperations. The consequences following a SIPS failure to operate (i.e. DBM)

are normally much higher compared with unnecessary SIPS operations (i.e. SBM).

Nevertheless, the SBM, although normally it has limited consequence on system

operation, needs to be effectively considered in the risk assessment model due to its


Page | 214

greater likelihood of occurrence. The aim is to balance the trade-off between scheme

dependability and security in SIPS operation.

The importance of SIPS reliability has been recognized by utilities and is addressed in

SIPS design and operation. Reliability standards are developed to evaluate the

performance of the protection schemes and ensure it fulfils the strict reliability

requirement. The review of existing SIPS applications, illustrates industry practices and

their attempts to use new technologies for monitoring, communication and control to

further enhance SIPS performance. The advances in ICT enable real time monitoring of

system conditions and provide a more accurate state estimation to facilitate the decision

making of SIPS. This also brings more flexibility in SIPS design and opens up more

solutions in enhancing SIPS performance. However, changes in the instrumentation,

monitoring, protection and control systems also raise major concerns in the overall

reliability of SIPS and needs to be considered in the reliability assessment.

Substations, as key components of a power grid, play a vital role in monitoring and

controlling power flows and interconnecting generating facilities, transmission and

distribution networks and customers. Successful operations of both local and system-

wide SIPS are heavily reliant on the monitoring, communication and control functions

in the substation automation system (SAS). The impact of component reliability on the

performance of different communication services in an IEC 61850 based substation is

discussed in Chapter 3. This is demonstrated by studying the availability of both

reporting and multi-casting communication services in the SAS. Component reliability,

system architecture and maintenance strategies are the main factors affecting the

reliability of the substation communication services. The implementation of redundancy

is also to be an effective method to improve system performance by eliminating “single-

point-of-failures”. The inherent redundancy of the RSTP ring station bus, and the

redundant communication paths implemented in accordance with IEC 62439,

significantly improves the MTTF of the data transfer in SAS. In addition, fast

identification and repair of failed components are also vitally important.

The protection devices are identified as one of the most critical components in the SAS,

they involve more hardware devices, software routines, firmware modules and user

defined settings than other types of equipment used in the SAS. Tripping signals from

protection devices are frequently used by SIPS for fast detection of a line outage. The


Page | 215

UK transmission network predominantly utilizes electronic or early numerical based

protection equipment to detect and clear short circuit fault. A significant number of

these protection devices are now reaching their predicted design lifetime of 25 years.

Consequently, the reliability and performance of the local protection devices need to be

investigated to ensure they are still in their reliable service lifetime, and furthermore, to

ensure they do not adversely affect the operation of SIPS that utilise their output

responses. A life-time assessment carried out in Chapter 4 evaluates the operational

conditions and identifies the life-limiting elements of the most commonly used

electronic relay types in the UK National Grid 400 kV and 275 kV transmission

networks.

The protection maloperation record indicates that all the selected relay types are serving

in a highly reliable manner, with no statistical evidence of vulnerable components or

modules. In addition, all of the three relay types offer equal performance in operational

speed and accuracy for their intended functions as compared to modern relay types. The

components most vulnerable to the thermal stress, high current stress and voltage stress

are identified as life-limiting elements and then examined via 3D X-ray micro

tomography study. No signs of degradation or wear-out can be identified. The study

helped National Grid extend the reliable service life of these protection relays for an

initial extension period of five years. It was concluded, equal reliability performance

can be achieved, as compared to what would be achieved by replacing them. In addition,

risks of infant mortality failures or initial application problems associated with

replacement relays can be avoided. This study also ensures the successful operation of

SIPS, which relies on reliable and timely operations of local protection.

To better manage the additional risks brought by SIPS, studies are required to evaluate

the impact of SIPS failures on the Power System and use the results to develop

appropriate reliability assessment models. An analytical risk assessment method based

on the reliability block diagram and the Markov model was developed in Chapter 5 and

used to quantify the risks associated with SIPS during normal operation, dependability-

based maloperation and security-based maloperation. By performing FMEA, all the

different failure modes co-existing in a SIPS component and their impact on the overall

SIPS reliability are identified. Probabilities of different SIPS maloperations are

estimated by combining SIPS operational modes with different system events. The risk


Page | 216

of SIPS operation is calculated as the probability of each failure state weighted by its

corresponding financial impact.

This procedure is used to illustrate how the different SIPS communication architectures

could affect SIPS performance. The implementation of a duplicated communication

network can significantly enhance the performance in terms of scheme dependability.

However, more redundancy may not necessarily result in better overall performance,

since it also leads to increased security risk. The method is next used to compare the

performance of SIPS with two different tripping logics: voting logic and vetoing logic.

The impact of the two tripping logics on the trade-off between dependability and

security in SIPS design was also studied. SIPS with a 1-out-2 voting logic delivers

better performance in dependability. However, due to the high level of redundancy in

the SIPS design, the probability of unwanted SIPS operation is increased due to the

misinterpretation of inputs or data. In this case, the vetoing logic can be used to

effectively prevent spurious SIPS operations and mitigate the security risks.

One of the main concerns in reliability assessment is the accuracy of the reliability data.

In this study, component reliability data are based on published information. The

challenges in extrapolating these into system data have been recognised and

consequently sensitivity studies are undertaken. It proves the performance of SIPS is

significantly affected by the MTTF, MTTR and system conditions (e.g. line outage rate,

load level, etc.). In addition, sensitivity studies performed on the SIPS risk assessment

results provides a useful guidance for utilities to identify the least reliable SIPS

component or operational phase. Reliability enhancement strategies could then be

implemented accordingly and enhanced inspection and maintenance allocated to the

vulnerable components. It also proves that the arming phase has equal importance in

enhancing the SIPS performance compared with the activation phase. In addition, the

application of sensitivity studies on Power System data effectively consider SIPS

performance under extreme system conditions.

With recent advances in wide area monitoring protection and control technology, the

implementation of SIPS with significant degree of centralisation have been completed

by some utilities. As illustrated in chapter 6, enhanced performance can be provided by

a system wide GRS, as compared with a local GRS, given a relatively high reliability of

the wide area communication network. The access to wide area information could


Page | 217

significantly assist the monitoring of system conditions and bring more flexibility in

SIPS logic design. In the future, the centralised SIPS would significantly facilitate the

coordination amongst adjacent protection schemes.

According to the UK National Grid Electricity Ten-year Statement (ETYS), the future

Great Britain energy landscape is going to involve a significant deployment of

renewable energy such as wind generation to decrease the carbon intensity of the

electricity system. Power System operational conditions will become more

unpredictable due to the intermittent nature of renewables and demand-side

participation. Generator rejection schemes (GRS) are frequently used to trip non-

priority generators during overloading and make full use of the transmission capacity.

Performance of a GRS implemented in a wind rich system is analysed in the numerical

studies. Failure of the GRS to operate during system contingencies could cause cascade

tripping of the transmission lines due to overloading and eventually lead to the isolation

of the wind farm or the load.

To effectively evaluate the impact of the significant variation in wind generation on the

risk assessment results, a stochastic risk assessment method based on Sequential Monte

Carlo Simulation (SMCS) was developed in Chapter 6. By integrating the ARMA wind

prediction model and dynamic load model, the risk assessment method could accurately

capture the time-series variations in the load and wind generation and the time-

dependent events. When implemented at a wind farm, significant variations in the

operational risk associated with GRS normal operation, DBM and SBM can be

observed due to the variations in wind generation output. Therefore, a precise wind

prediction model and a dynamic risk assessment method are critical in forecasting and

managing GRS risks.

As a cost-effective alternative to transmission system upgrading, SIPS is a widely used

solution to deal with an increasingly stressed transmission network. This results in a

widespread proliferation of SIPS in many networks, leading to increased operational

complexity and a higher probability of unintended or undesired SIPS interactions. As

reviewed in Chapter 2, both the Irish incident on 5th August 2005 and the Nordic event

on 1st December were caused by the interaction between overlapping or neighbouring

SIPS, leading to severe consequences, such as a system blackout. The previous methods,

which focused on assessing the performance of a single SIPS, are no longer sufficient.


Page | 218

A procedure to evaluate the risk of undesirable interactions between SIPS is provided in

Chapter 7. The simulation performed on the PJM 5-bus system indicates that the

probability of unintended interactions between SIPS is highly related to the system

condition. Under stressed system conditions with a high load demand and generation

output, the cascading SIPS operation is more likely to occur. Unintended interactions

between SIPS could lead to a more severe impact, as compared with an individual SIPS

failure. In addition, the operating risk of SIPS, especially the risk caused by SIPS

interactions, would increase significantly as the number of the schemes in the Power

System rises.

In the near future, system operators are facing more severe challenges in managing the

operational risk of SIPS. Therefore, the role of SIPS within the context of long-term

Power System planning is considered. Increasing wind penetration and rising load

levels will increase the operational cost of SIPS. However, the construction of new

transmission circuits significantly reduce the risks associated with SIPS maloperations,

decreases wind curtailment and allow greater access to lower cost and/or

environmentally friendly energy in the system. Therefore, a SIPS assisted transmission

upgrading plan helps maximize the reliable operation of the Power System, whilst

minimizing the production and investment costs.

A new type of SIPS with adaptive operational logic, which adjusts to the various system

conditions, was developed in Chapter 7 and used to manage the SIPS induced risk.

Currently, most existing SIPS are built based on predetermined seasonal and off-line

mitigation actions. However, adaptive SIPS, made available by modern IEDs and a

system wide monitoring system, allows system operators to shift the balance between

system dependability and security. When the system is less stressed and has a relatively

low load level, as compared to maximum or nominal system load, the risk from SBM is

the main source of SIPS operational risk. Consequently, more secure protection logic

can be implemented. In contrast, when the system is heavily stressed or when the other

scheme is in failure state, the successful operation of the SIPS is vitally important in

preventing a system cascading failure and needs to be highly dependable. The adaptive

protection not only helps reduce the risk of DBM and SBM of individual SIPS, but also

helps achieve better coordination between SIPS. The significant variation in SIPS

operational risk due to the fast changing system conditions can be effectively mitigated.


Page | 219

In conclusion, the study undertaken in this research provides a comprehensive

framework to assess the reliability of SIPS designed to prevent system contingencies.

The proposed risk assessment methodologies could assist utilities to determine the

optimal SIPS design and to effectively manage the risks brought by SIPS to Power

System operation.

8.3. Future Work

Based on the work presented in this thesis, the suggestions for the future work are

focused on assessing the newly emerged opportunities and challenges related to the

reliable operation of a Power System, the optimization of the SIPS reliability

assessment models and seeking new solutions to enhance SIPS reliability.

Investigating the next step in SIPS development

Currently, most existing SIPS are event-based with predetermined seasonally or off-line

defined mitigation actions. The enabling technologies that have brought great benefits

and advances in the design and application aspects of SIPS are discussed in this study.

With the platform of centralised SIPS and its wide-area communication infrastructures,

SIPS could use data from a wider area to enhance system estimation and provides

opportunities for WAMPAC schemes.

When the data required by a SIPS controller is sourced from different locations, it is

crucial that all the data is synchronised to ensure efficient and effective system

operation. Application of the IEC 1588 high-precision time protocol can achieve sub-

microsecond accuracy time synchronisation for both LAN and substation-wide time

information [100, 101]. In addition, the strict time requirements of the remedial control

actions need to be fulfilled. To achieve this, it is vital to ensure fast computing and data

processing by the central controllers and to confirm the effectiveness of the analysis

tools when delivering valid solutions for all possible system contingencies.

In addition, IEC Technical Report 61850-90-5 [30] provides details of a communication

protocol for event-driven GOOSE message, designed to extend its application from a

LAN to a WAN. This significantly facilitates the application of SIPS in a wider area.

However, it also raises concerns about the security of the GOOSE message over WAN

based communication. The GOOSE message needs to be encoded to reduce the


Page | 220

vulnerability related to cyber security. The Group Domain of Interpretation (RFC 6407 -

GDOI) can be used to provide symmetric keys to secure data signing and encryption.

However, the performance of GOOSE message communication during cyber-attacks

and the associated cyber security risk needs to be investigated.

Furthermore, another important aspect in SIPS development is to develop better system

visualisation to enhance the capabilities of wide-area schemes. With the WAMPAC

system and phasor measurement units, system visualization of real-time data can be

realised. This helps provide system operators with greater system awareness and this

allows more precise control of the system, hence reducing the probability of

maloperations caused by human errors.

Investigate of the impact of demand side management on SIPS risks

In the SMCS based risk assessment procedure developed in Chapter 6, the IEEE-RTS

load model is used to capture the load variation over a calendar year. Testing results

indicates that the risk of SIPS DBM becomes extremely high during severe system

conditions with high wind generation output and extreme load demand. The use of

demand management, enabled by innovations in advanced metering infrastructures,

communications and smart appliances, brings more sophisticated demand response

options. It helps shift consumption from peak hours to off-peak hours. Appropriate load

shifting is becoming more crucial with the popularization of electric vehicles (EV).

However, changing human behaviour based on electricity pricing or direct control is

problematic and is often less effective than expected [102].

Hence, in the future, it is recommended to consider how demand side management can

reduce the SIPS operational risks. By using either direct load control (DLC) or real time

pricing (RTP), the peak-to-average ratio (PAR) in load demand can be effectively

reduced [100, 103]. If this is achievable, the significantly high risk of SIPS operation

during severe system conditions can be mitigated. Furthermore, the operation of SIPS

and demand side management strategies need to be coordinated to achieve optimal

system reliability.

Verification of Protection Asset End-of-life Analysis


Page | 221

The application of an end-of-life evaluation process to support and validate an asset life

extension decision for various selected relay types was described in Chapter 4. Due to

the critical function of the protection devices, the assessment of asset life is undertaken

during the reliable service lifetime of the equipment life, and before the occurrence of a

significant increase in ageing related failures.

To validate the effectiveness of end-of-life assessment based on sample testing with

limited lifetime data, specific follow-up rechecking procedures are required for each

relay type. The following tests can be done if more ageing related failures could be

identified:

1) For any targeted relay that fails in service during the extended lifetime, studies need

to be carried out to investigate the failure and report results, including any impact on

replacement life policy, and on conclusions of this set of reports.

2) Once more ageing related failure data are available, statistical analysis can be carried

out to predict the “rising edge” of the bath-tub curve for the protection devices. The

estimated reliable service life of the protection asset can be used to compare with the

conclusions drawn in this study.

Furthermore, with experience gained in the present study, the evaluation process can

then be applied to other electronic devices with similar components and hardware

platform.

Development of reliability database

One of the main concerns in reliability assessment is the accuracy of the reliability data.

It significantly affects the usefulness of the reliability assessment results. Currently,

most of the data used in the reliability assessment are based on reliability standards,

instead of field performance. Sensitivity studies, as illustrated in this thesis, are an

effective method to take consideration of the uncertainty in reliability data. However, in

the future, the development of a reliability database based on field performance will be

of considerable use in increasing the accuracy of the reliability assessment and

enhancing the qualities of the prediction of the Power System risk. Long-term

monitoring of Power System component defects and the tracking of their operational

conditions would also contribute to a better understanding of the components’ life cycle.

Page | 222

References [1] S. H. Horowitz and A. G. Phadke, "Boosting immunity to blackouts," IEEE

Power and Energy Magazine, vol. 1, pp. 47-53, 2003.

[2] I. Bazovsky, Reliability Theory and Practice. Dover Publications, 1961.

[3] CIGRE, "POWER SYSTEM RELIABILITY ANALYSIS," CIGRE WG 03 of SC

38, 1987.

[4] R. Billinton and R. N. Allan, Reliability Assessment of Large Electric Power

Systems. Kluwer Academic Publishers, 1988.

[5] R. Billinton and R. N. Allan, Reliability Evaluation of Power Systems. Plenum

Press, 1996.

[6] R. Billinton and W. Li, Reliability Assessment of Electric Power Systems Using

Monte Carlo Methods. Plenum Press, New York, 1994.

[7] P. Kundur, J. Paserba, V. Ajjarapu, G. Andersson, A. Bose, C. Canizares, N.

Hatziargyriou, D. Hill, A. Stankovic, C. Taylor, T. V. Cutsem, and V. Vittal,

"Definition and classification of power system stability IEEE/CIGRE joint task

force on stability terms and definitions," IEEE Transactions on Power Systems,

vol. 19, pp. 1387-1401, 2004.

[8] F. Rahimi, A. Ipakchi, and F. Fletcher, "The Changing Electrical Landscape:

End-to-End Power System Operation Under the Transactive Energy Paradigm,"

IEEE Power and Energy Magazine, vol. 14, pp. 52-62, 2016.

[9] NERC System Disturbance Reports. Available:

http://www.nerc.com/pa/rrm/ea/System%20Disturbance%20Reports%20DL/For

ms/AllItems.aspx

[10] "Report on Investigation into System Disturbance of August 5th 2005,"

Electricity Supply Board (ESB) and National GridDec. 2005.

[11] J. Walseth, J. Eskedal, and O. Breidablik, "Analysis of Misoperations of

Protection Schemes in the Nordic Grid," Protection, Automation, and Control

World, March, 2010.

[12] US-Canada Power System Outage Task Force, "Blackout 2003: Final report on

the August 14, 2003 blackout in the United States and Canada: Causes and

recommendations," Office of Electricity Delivery & Energy

Reliability,Washington, DC., 2004.

[13] "Report of the Enquiry Committee on grid disturbance in northern region on 30th

July 2012 and in northern, eastern & northeastern region on 31st July 2012.," The

Enquiry Committee, Ministry of Commerce and Industry, Government of India,

New Delhi, India, 2012.

[14] V. Madani, D. Novosel, S. Horowitz, M. Adamiak, J. Amantegui, D. Karlsson, S.

Imai, and A. Apostolov, "IEEE PSRC Report on Global Industry Experiences

With System Integrity Protection Schemes (SIPS)," IEEE Transactions on Power

Delivery, vol. 25, pp. 2143-2155, 2010.

[15] S. H. Horowitz, D. Novosel, V. Madani, and M. Adamiak, "System-wide

Protection," IEEE Power and Energy Magazine, vol. 6, pp. 34-42, 2008.

[16] V. Madami, M. Adamiak, and M. Thakur, "Design and implementation of wide

area special protection schemes," in 57th Annual Conference for Protective Relay

Engineers, 2004, 2004, pp. 392-402.

http://www.nerc.com/pa/rrm/ea/System%20Disturbance%20Reports%20DL/Forms/AllItems.aspx

http://www.nerc.com/pa/rrm/ea/System%20Disturbance%20Reports%20DL/Forms/AllItems.aspx

References

Page | 223

[17] P. M. Anderson and B. K. LeReverend, "Industry experience with special

protection schemes," IEEE Transactions on Power Systems, vol. 11, pp. 1166-

1179, 1996.

[18] WECC Relay Work Group, "Remedial Action Scheme Design Guide," February

2006.

[19] W. Winter and B. LeReverend, "Operational performance of bulk electricity

system control aids," Electra, N. 123, March 1989.

[20] J. D. McCalley and F. Weihui, "Reliability of special protection systems," IEEE

Transactions on Power Systems, vol. 14, pp. 1400-1406, 1999.

[21] NERC reliability standards. (16 Mar). Protection and control. Available:

http://www.nerc.net/standardsreports/standardssummary.aspx

[22] ISA, "Safety Instrumented Functions (SIF) - Safety Integrity Level (SIL)

Evaluation Techniques," 17 June 2002.

[23] Wikipedia. (16 Mar). Available: http://en.wikipedia.org/wiki/Spurious_trip_level

[24] K. Harker, "The north wales supergrid special protection schemes," Electronics

and Power, vol. 30, pp. 719-724, 1984.

[25] M. Panteli, P. A. Crossley, and J. Fitch, "Quantifying the reliability level of

system integrity protection schemes," IET Generation, Transmission &

Distribution, vol. 8, pp. 753-764, 2014.

[26] D. Miller, R. Schloss, and S. Manson, "Pacificorp’s Jim Bridge RAS: A dual

triple modular redundant case study," Mar. 2, 2009.

[27] K. Baskin, M. Thompson, and L. Lawhead, "Design and testing of a system to

classify faults for a generation-shedding RAS," in 2009 62nd Annual Conference

for Protective Relay Engineers, 2009, pp. 140-149.

[28] J. Wen, W. H. E. Liu, P. L. Arons, and S. K. Pandey, "Evolution Pathway

Towards Wide Area Monitoring and Protection—A Real-World

Implementation of Centralized RAS System," IEEE Transactions on Smart Grid,

vol. 5, pp. 1506-1513, 2014.

[29] IEC 61850 communication networks and systems in substations Specific

Communication Service Mapping (SCSM) - Mapping to MMS (ISO 9506-1 and

ISO 9506-2) and to ISO/IEC 8802-3, pt. 8-1. Available: http://www.iec.ch

[30] I. T. R. 61850-90-5:2012, "Communication networks and systems for power

utility automation - Part 90-5: Use of IEC 61850 to transmit synchrophasor

information according to IEEE C37.118," 2012.

[31] All Island Transmission System Map. Available:

http://smartgriddashboard.eirgrid.com/#all/transmission-map

[32] V. Terzija, G. Valverde, D. Cai, P. Regulski, V. Madani, J. Fitch, S. Skok, M. M.

Begovic, and A. Phadke, "Wide-Area Monitoring, Protection, and Control of

Future Electric Power Networks," Proceedings of the IEEE, vol. 99, pp. 80-93,

2011.

[33] S. Tamronglak, S. H. Horowitz, A. G. Phadke, and J. S. Thorp, "Anatomy of

power system blackouts: preventive relaying strategies," IEEE Transactions on

Power Delivery, vol. 11, pp. 708-715, 1996.

[34] S. H. Horowitz, A. G. Phadke, and J. S. Thorpe, "Adaptive transmission system

relaying," IEEE Transactions on Power Delivery, vol. 3, pp. 1436-1445, 1988.

[35] K. Chul-Hwan, H. Jeong-Yong, and R. K. Aggarwal, "An enhanced zone 3

algorithm of a distance relay using transient components and state diagram,"

IEEE Transactions on Power Delivery, vol. 20, pp. 39-46, 2005.

http://www.nerc.net/standardsreports/standardssummary.aspx

http://en.wikipedia.org/wiki/Spurious_trip_level

http://www.iec.ch/

http://smartgriddashboard.eirgrid.com/#all/transmission-map

References

Page | 224

[36] S. Sheng, K. K. Li, W. L. Chan, X. Zeng, D. Shi, and X. Duan, "Adaptive Agent-

Based Wide-Area Current Differential Protection System," IEEE Transactions on

Industry Applications, vol. 46, pp. 2111-2117, 2010.

[37] M. Begovic, D. Novosel, D. Karlsson, C. Henville, and G. Michel, "Wide-Area

Protection and Emergency Control," Proceedings of the IEEE, vol. 93, pp. 876-

891, 2005.

[38] IEC 61850 Communication networks and systems in substations—Use of IEC

61850 for the communication between substations pt. 90–1. Available:

http://www.iec.ch

[39] J. Sykes, M. Adamiak, and G. Brunello, "Implementation and Operational

Experience of a Wide Area Special Protection Scheme on the SRP System," in

2006 Power Systems Advanced Metering, Protection, Control, Communication,

and Distributed Resources, 2006, pp. 145-158.

[40] M. G. Adamiak, A. P. Apostolov, M. M. Begovic, C. F. Henville, K. E. Martin, G.

L. Michel, A. G. Phadke, and J. S. Thorp, "Wide Area

Protection—Technology and Infrastructures," IEEE Transactions on Power

Delivery, vol. 21, pp. 601-609, 2006.

[41] Y. Wang, W. Li, and J. Lu, "Reliability Analysis of Wide-Area Measurement

System," IEEE Transactions on Power Delivery, vol. 25, pp. 1483-1491, 2010.

[42] Alstom Grid, Network Protection & Automation Guide: Protective Relays,

Measurement & Control, May 2011.

[43] IEC 61850, "Communication networks and systems in substations," Institute of

Electrical and Electronics Engineers, Tech. Rep., 2002-2005.

[44] P. Zhang, L. Portillo, and M. Kezunovic, "Reliability and Component Importance

Analysis of All-Digital Protection Systems," in 2006 IEEE PES Power Systems

Conference and Exposition, 2006, pp. 1380-1387.

[45] P. Leupp and C. Rytoft, "Special Report IEC 61850," ABB.

[46] J. Wen, C. Hammond, and E. A. Udren, "Wide-area Ethernet network

configuration for system protection messaging," in 2012 65th Annual Conference

for Protective Relay Engineers, 2012, pp. 52-72.

[47] "IEEE Standard Communication Delivery Time Performance Requirements for

Electric Power Substation Automation," IEEE Std 1646-2004, pp. 0_1-24, 2005.

[48] K.P. Brand, C. Brunner, and W. Wimmer, "Design of IEC61850 based Substation

Automation System according to Customer Requirements," CIGRE Plenary

meeting, Session of SC B5, Paper B5-103, Paris, 2004.

[49] G. Antonova, L. Frisk, and J. C. Tournier, "Communication redundancy for

substation automation," in 2011 64th Annual Conference for Protective Relay

Engineers, 2011, pp. 344-355.

[50] International Electrotechnical Commission IEC 62439-3, "Industrial

communication networks - High availability automation networks - Part 3:

Parallel Redundancy Protocol (PRP) and High-availability Seamless Redundancy

(HSR)," 2016.

[51] B.Kasztenny, J.Whatley, and E.A.Udren, "IEC 61850: A Practical Application

Primer for Protection Engineers," 60th Annual Georgia Tech Protective Relaying

Conference, Atlanta, GA, , May 3-5, 2006.

[52] H. Hajian-Hoseinabadi, "Availability Comparison of Various Power Substation

Automation Architectures," IEEE Transactions on Power Delivery, vol. 28, pp.

566-574, 2013.

http://www.iec.ch/

References

Page | 225

[53] H. Hajian-Hoseinabadi and M. E. H. Golshan, "Availability, Reliability, and

Component Importance Evaluation of Various Repairable Substation Automation

Systems," IEEE Transactions on Power Delivery, vol. 27, pp. 1358-1367, 2012.

[54] L. R. C. Ferreira, P. A. Crossley, J. Goody, and R. N. Allan, "Reliability

evaluation of substation control systems," IEE Proceedings - Generation,

Transmission and Distribution, vol. 146, pp. 626-632, 1999.

[55] L. Andersson, K. P. Brand, C. Brunner, and W. Wimmer, "Reliability

investigations for SA communication architectures based on IEC 61850," in 2005

IEEE Russia Power Tech, 2005, pp. 1-7.

[56] "IEEE Recommended Practice for Design of Reliable Industrial and Commercial

Power Systems (IEEE Gold Book)," IEEE Standard 493-2007, 2007.

[57] IEEE RFC 3376, "Internet Group Management Protocol, Version 3," October,

2002.

[58] B. Beresh and B. Machie, "I22: End-Of-Life Assessment of P&C Devices," PSRC

Working Group, May 2015.

[59] P. J. Smith, M. Shafi, and G. Hongsheng, "Quick simulation: a review of

importance sampling techniques in communications systems," IEEE Journal on

Selected Areas in Communications, vol. 15, pp. 597-613, 1997.

[60] H. Maciejewski, G. J. Anders, and J. Endrenyi, "On the use of statistical methods

and models for predicting the end of life of electric power equipment," in 2011

International Conference on Power Engineering, Energy and Electrical Drives,

2011, pp. 1-6.

[61] V. I. Kogan, J. A. Fleeman, J. H. Provanzana, and C. H. Shih, "Failure analysis of

EHV transformers," IEEE Transactions on Power Delivery, vol. 3, pp. 672-683,

1988.

[62] M. T. Schilling, J. C. G. Praca, J. F. d. Queiroz, C. Singh, and H. Ascher,

"Detection of ageing in the reliability analysis of thermal generators," IEEE

Transactions on Power Systems, vol. 3, pp. 490-499, 1988.

[63] B. Retterath, S. S. Venkata, and A. A. Chowdhury, "Impact of time-varying

failure rates on distribution reliability," in 2004 International Conference on

Probabilistic Methods Applied to Power Systems, 2004, pp. 953-958.

[64] L. Wenyuan, "Evaluating mean life of power system equipment with limited end-

of-life failure data," in IEEE Power Engineering Society General Meeting, 2005,

2005, p. 2390 Vol. 3.

[65] J. E. Cota-Felix, F. Rivas-Davalos, and S. Maximov, "An alternative method for

estimating mean life of power system equipment with limited end-of-life failure

data," in 2009 IEEE Bucharest PowerTech, 2009, pp. 1-4.

[66] R. L. M. Auterei, T. Rahman, A. Wen, D. Zee, P. J. Tavener, "Investigation on

Ageing and Life Extension of Protective Relays," Dec 2009 – Nov 2011.

[67] N. Grid, "SHNB, THR and LFCB Relay Age Profile (internal documents)," 2013.

[68] N. Grid, "FAULTS-2000-2013 –PROTN Malops THR-SHNB-LFCB," 2014.

[69] N. Grid, "Thermal Overload Capabilities of Protection Equipment," TGN(E) 66,

issue 3, September 2000.

[70] N. L. P. Crossley, B. Gwyn, et al, "Asset Life Extension Evaluation – SHNB,"

Quanta Technology, LLC for National Grid, February 2015.

[71] N. L. P. Crossley, B. Gwyn, et al, "Asset Life Extension Evaluation – THR,"


References

Page | 226

[72] N. L. P. Crossley, B. Gwyn, et al, "Asset Life Extension Evaluation – LFCB,"


[73] P. A. Agyakwa, L. Yang, M. R. Corfield, and C. M. Johnson, "A non-destructive

study of crack development during thermal cycling of Al wire bonds using x-ray

computed tomography," in CIPS 2014; 8th International Conference on

Integrated Power Electronics Systems, 2014, pp. 1-5.

[74] F. Weihui, Z. Sanyi, J. D. McCalley, V. Vittal, and N. Abi-Samra, "Risk

assessment for special protection systems," in 2002 IEEE Power Engineering

Society Winter Meeting. Conference Proceedings (Cat. No.02CH37309), 2002, p.

740 vol.2.

[75] M. Esmaili, A. Hajnoroozi, and H. Shayanfar, "Risk Evaluation of Online Special

Protection Systems," International Journal of Electrical Power & Energy Systems,

vol. 41, pp. 137-144, 2012.

[76] T. Y. Hsiao and C. N. Lu, "Risk Informed Design Refinement of a Power System

Protection Scheme," IEEE Transactions on Reliability, vol. 57, pp. 311-321, 2008.

[77] J. L. C. d. Miguel, P. J. Ramírez, S. H. Tindemans, and G. Strbac, "Cost-benefit

analysis of unreliable System Protection Scheme operation," in 2015 IEEE

Eindhoven PowerTech, 2015, pp. 1-6.

[78] J. L. Calvo, S. H. Tindemans, and G. Strbac, "Managing risks from reverse flows

under distribution network outage scenarios," in IET International Conference on

Resilience of Transmission and Distribution Networks (RTDN) 2015, 2015, pp. 1-

6.

[79] C. Shipman, K. Hopkinson, and J. Lopez, "Con-resistant trust for improved

reliability in a smart grid special protection system," in 2015 IEEE Power &

Energy Society General Meeting, 2015, pp. 1-1.

[80] R. Billinton and A. Sankarakrishnan, "A comparison of Monte Carlo simulation

techniques for composite power system reliability assessment," in IEEE

WESCANEX 95. Communications, Power, and Computing. Conference

Proceedings, 1995, pp. 145-150 vol.1.

[81] UK National Grid. (2016). Electricity Ten Year Statement 2016. Available:

http://www2.nationalgrid.com/UK/Industry-information/Future-of-

Energy/Electricity-Ten-Year-Statement/

[82] Int. Electrotech. Comm., "Communication networks and systems in substations -

Part 9-2: Specific Communication Service Mapping (SCSM) - Sampled values

over ISO/IEC 8802-3," ed, 2004.

[83] V. Madani, E. Taylor, D. Erwin, A. Meklin, and M. Adamiak, "High-Speed

Control Scheme to Prevent Instability of A Large Multi-Unit Power Plant," in

2007 60th Annual Conference for Protective Relay Engineers, 2007, pp. 271-282.

[84] R. Billinton and R. N. Allan, Reliability Evaluation of Engineering Systems, 1983.

[85] C. Grigg, P. Wong, P. Albrecht, R. Allan, M. Bhavaraju, R. Billinton, Q. Chen, C.

Fong, S. Haddad, S. Kuruganty, W. Li, R. Mukerji, D. Patton, N. Rau, D. Reppen,

A. Schneider, M. Shahidehpour, and C. Singh, "The IEEE Reliability Test

System-1996. A report prepared by the Reliability Test System Task Force of the

Application of Probability Methods Subcommittee," IEEE Transactions on

Power Systems, vol. 14, pp. 1010-1020, 1999.

[86] E. Leahy and R. S. J. Tol, "An Estimate of the Value of Lost Load for Ireland,"

ESRI Working Paper 357, 2010.

http://www2.nationalgrid.com/UK/Industry-information/Future-of-Energy/Electricity-Ten-Year-Statement/

http://www2.nationalgrid.com/UK/Industry-information/Future-of-Energy/Electricity-Ten-Year-Statement/

References

Page | 227

[87] Electricity Networks Strategy Group, "Our Electricity Transmission Network: A

Vision for 2020," Technical Report URN: 09D/717, July 2009.

[88] UK National Grid. (2016). System Operability Framework 2016. Available:


Energy/System-Operability-Framework/

[89] G. Sinden, "Characteristics of the UK wind resource: Long-term patterns and

relationship to electricity demand," Energy Policy, vol. 35, pp. 112–117, 2007.

[90] A. J. Roscoe and G. Ault, "Supporting high penetrations of renewable generation

via implementation of real-time electricity pricing and demand response," IET

Renewable Power Generation, vol. 4, pp. 369-382, 2010.

[91] UK National Grid. (2016). Network Options Assessment. Available:


Energy/Network-Options-Assessment/

[92] P. Glynn and W. Whitt, "The asymptotic validity of sequential stopping rules for

stochastic simulations," Ann. Appl. Probab., vol. 2, pp. 180–198, 1992.

[93] University of reading, "Documentation for wind profile program."

[94] R. Billinton, C. Hua, and R. Ghajar, "A sequential simulation technique for

adequacy evaluation of generating systems including wind energy," IEEE

Transactions on Energy Conversion, vol. 11, pp. 728-734, 1996.

[95] Vestas, "V90-e.0MW Turbine."

[96] J. McCalley, O. Oluwaseyi, V. Krishnan, R. Dai, C. Singh, and K. Jiang, "System

Protection Schemes: Limitations, Risks, and Management," PSERC Publications,

December 2010.

[97] O. Olatujoye, V. Krishnan, and J. McCalley, "Including special protection

schemes and operational complexity within transmission planning," Power and

Energy Society General Meeting, 2011.

[98] F. Li and R. Bo, "Small test systems for power system economic studies," in

IEEE PES General Meeting, 2010, pp. 1-4.

[99] F. Weihui, Z. Sanyi, J. D. McCalley, V. Vittal, and N. Abi-Samra, "Risk

assessment for special protection systems," IEEE Transactions on Power Systems,

vol. 17, pp. 63-72, 2002.

[100] A. H. Mohsenian-Rad, V. W. S. Wong, J. Jatskevich, R. Schober, and A. Leon-

Garcia, "Autonomous Demand-Side Management Based on Game-Theoretic

Energy Consumption Scheduling for the Future Smart Grid," IEEE Transactions

on Smart Grid, vol. 1, pp. 320-331, 2010.

[101] C. M. D. Dominicis, P. Ferrari, A. Flammini, S. Rinaldi, and M. Quarantelli, "On

the Use of IEEE 1588 in Existing IEC 61850-Based SASs: Current Behavior and

Future Challenges," IEEE Transactions on Instrumentation and Measurement,

vol. 60, pp. 3070-3081, 2011.

[102] Y. Li and P. A. Crossley, "Voltage balancing in low-voltage radial feeders using

Scott transformers," IET Generation, Transmission & Distribution, vol. 8, pp.

1489-1498, 2014.

[103] P. Palensky and D. Dietrich, "Demand Side Management: Demand Response,

Intelligent Energy Systems, and Smart Loads," IEEE Transactions on Industrial

Informatics, vol. 7, pp. 381-388, 2011.

http://www2.nationalgrid.com/UK/Industry-information/Future-of-Energy/System-Operability-Framework/

http://www2.nationalgrid.com/UK/Industry-information/Future-of-Energy/System-Operability-Framework/

http://www2.nationalgrid.com/UK/Industry-information/Future-of-Energy/Network-Options-Assessment/

http://www2.nationalgrid.com/UK/Industry-information/Future-of-Energy/Network-Options-Assessment/

Page | 228

Appendix A: Protection Fingerprint Testing

A.1 Line Parameters and Protection Settings

The settings of each relay type and the circuit parameters of their protected lines are

provided by the National Grid based on the 400kV transmission systems. The fingerprint

is then performed based on the following parameters.

1) SHNB:

Line Parameters:

Line length: 81 km

Current Transformer Ratio: 2000/1

Voltage Transformer Ratio: 400k/110

Line Impedance (% on 100MVA): Z1 = 0.1718 + j1.5709 Z0 = 0.7282 + j4.3201

Z0m = 0.5564 + j2.3319

SHNB MicroMho Relay Settings:

Vn = 110V (Voltage rating at secondary side)

In = 1A (Current rating at secondary side)

Residual Compensation Factor (R.C.F) = 0.538

Relay Character Angle (RCA) = 85 DEG

Scheme Selection: X = 1, Y = 1

Zone Settings:

Zone 1: Forward reach = 11.52 Ω sec 80% Trip time: 0 s

Zone 2: Forward reach = 21.6 Ω sec 150% Trip time: 500 ms

Zone 3: Forward reach = 28.8 Ω sec 200%

Reverse Reach = 2.4 Ω sec -16.5% Trip time: 1 s

2) THR:

Line Parameters:

Line length: 51 km

Current Transformer Ratio: 2000/1

Voltage Transformer Ratio: 400k/110

Line Impedance (% on 100MVA): Z1 = 0.055 + j0.8534 Z0 = 0.3306 + j2.5346

Appendix

Page | 229

Z0m = 0.2745 + j1.3521

THR Relay Settings:

Vn = 110V (Voltage rating at secondary side)

In = 2A (Current rating at secondary side)

Residual Compensation Factor (R.C.F) = 0.627

Relay Character Angle (RCA) = 75 DEG

Zone Settings:

Zone 1: Forward reach = 6.12 Ω sec 81.4% Trip time: 0s

Zone 2: Forward reach = 11.63 Ω sec 155% Trip time: 500 ms

Zone 3: Forward reach = 19.58 Ω sec 260%

Reverse Reach = 1.96 Ω sec 26.1% Trip time: 1 s

3) LFCB:

Protection Settings:

Lower Slope Threshold (IS1): 0.20 p.u.

Lower Slope % Bias (k1): 30%

Upper Slope Threshold (IS2): 2.00 p.u.

Upper Slope % Bias (k2): 150%

Permissive Intertrip Time (PIT): 0 sec

Communications Settings:

Comms. Channel Delay Tolerance = 250 microsec

Comms. Channel Failure Alarm Time = 9.9 sec

Relay Address = 1-A

Serial Port Settings:

Baud Rate: 4800 baud

Bit Framing Format: Data bits: 8 bits

Parity: none

Stop bits: 1 bit

Remote Access Level: Limited

Appendix

Page | 230

Scheme Logic Settings:

Block Auto-reclose Mode: PIT & 3PH Fault

Tripping Mode: Three-pole

Configuration: 2 ended

Time Synchronization: Time Sync. Period: 30 min

Appendix

Page | 231

A.2 PSCAD model used for dynamic fault based distance relay testing

Figure A-1: PSCAD Model used for Dynamic Fault based Distance Relay Testing

Appendix

Page | 232

Appendix B: Vulnerable Components Assessment

B.1 Vulnerable Components Examined via X-ray Tomography

Relay

Type Component

Module

(function)

Relay

(Type/Serial

No)

Relay

age

(years)

Operate

tem./ °C

SHNB

1

HMOS single component

microcontroller (plastic

encapsulated)

P8039AHL L5222957

(INTEL 1977)

16

(Microprocessor) SHNB102 17 33C

2

Voltage regulator (plastic

encapsulated)

MC T7805CT

21/23/25

(Comparator) SHNB102 17 32C

3

JFET input operational

amplifier

M53AK LF355H

30 (Voltage input) SHNB102 17 34C

4 Operational amplifier

UA741CN

13 (Voltage

supervision) SHNB101 7-8 -


UA741CN

13 (Voltage

supervision) SHNB102 17 33C


UA741CN

18 (Phase &

Neutral) SHNB102 17 27C

7 Voltage regulator (metal

can package)

21/23/25

(Comparator) 22 29C

8 Voltage regulator (metal

can package)

21/23/25

(Comparator) SHNB101 7-8 -

9 Small signal diode

BAV21

13 (Voltage

supervision) SHNB102 17 33C

THR

1

TO-18 metal case

transistor

2N2906

Power Supply

UV-OV-OC

CARD (T4)

THR 32 ≤ 25C

2

TO-18 metal case

transistor

2N2222A

Earth Fault Box

V.T.S Module

(T45)

THR 32 ≤ 25C

3 Zener diode

BZY88C

Phase Fault Box

Z1B-R

comparator (D6)

THR 32 ≤ 29C

4 Film Resistor

2.2 k (±5%) 2W

Power Supply

Output Regulator

Card (R22)

THR 32 49C

LFCB

1 Voltage regulator (plastic)

HM91AR LM940T

4 (GM0052021)

(Communication

controller board)

LFCB

192(102)

547373C

15 41C


7812CT

4 (GM0052021)

(Communication

controller board)

LFCB

192(102)

547373C

15 35C

3

Enhanced serial

communications

controller (ESCC)

AM85C30-8JC

4 (GM0052021) LFCB192(102)

547373C 15 33C


4 (GM0052021)

(Communication

interface board)

LFCB103

208284J 8 -


7812CT

4 (GM0052021)

(Controller board)

LFCB103

208284J 8 -

Appendix

Page | 233

B.2 Component Degradation Mechanisms

Degradation Mechanisms of Transistor Packages

The anatomies of generic single chip transistor packages are represented in Figure B-1.

It depicts a monolithic integrated circuit, with a centrally positioned leadframe pad or

substrate upon which the semiconductor die is attached. Bond wires provide

interconnections between the die and the leads, and a polymeric mould compound holds

the assembly together, prevents the ingress of moisture and dust, and provides

insulating dielectric properties.

Figure B-1: Typical Plastic Encapsulated Transistor Package

During the operation of a transistor package, heat is generated within the semiconductor

chip(s) due to switching and conduction losses and this heat must be removed (i.e.

transferred to the ambient air) as efficiently as possible to maximise the electrical

performance and mechanical reliability of the component. Heat is conducted away

through the package leadframe, allowing the component to remain within its optimum

operation temperature limits. The lower the junction temperature of the device, the more

reliably the module will function. A pathway with low thermal impedance must be

created from the device level to a point where the heat may be dissipated safely without

damage to the circuit. The following reliability considerations need to be assessed:

1) Die attachment / solder joint reliability: effectiveness of the thermal path is to a large

extent determined by the die attachment or solder layer between the chip and the

leadframe pad.

Appendix

Page | 234

2) Wire bond reliability: wire bond lift-off is one of the most common causes of failure.

Lift-off is highly undesirable as it obviously leads to loss of electrical

interconnection and impairment or failure of function.

3) Leadframe reliability: moisture ingress through the encapsulant which migrates to

resin-leadframe interfaces can result in delamination inability to withstand voltage

and cause open circuit failures.

Degradation Mechanisms of Electrolytic Capacitor

Electrolytic capacitor technologies provide moderate energy (< 1 kJ/kg) and power

density and are polarity dependent (having distinct positive and negative terminals, and

cannot withstand voltage reversal in excess of 1.5V). Typical applications are moderate

to large capacitors (0.1F to 3F) and voltage ratings from 5 to 500V. The typical

temperature range is limited to about 80 to 105°C due to conduction effects and

reliability concerns. The electrical conductivity of the electrolyte increases as

temperature increases.

Vitalisation of electrolyte from the cylinder at high temperatures leads to degradation in

capacitance and increase in equivalent series resistance (ESR) of ‘wet’ electrolytic

capacitors. The impact of decreased capacitance obviously depends on the application

within which a capacitor is employed.

The stability of the oxide layer (dielectric) of the anode (the oxide layer may deteriorate

under high voltage and high temperature) and the interaction of the anode and cathode

foils with the electrolyte. The stability of the sealing elements (preventing permeation of

the electrolyte solvent through the seals may cause the capacitor to dry out). Within the

context of protection relays, the physical degradation due to temperature strongly

interacts with PCB designs and topologies, and capacitors in close proximity to heat

generating elements such as high power resistors, transformers and IC packages may

wear out faster than others.

Appendix

Page | 235

B.3 Structural Investigation via 3D X-Ray Microtomography

Detailed structural investigation was undertaken of a number of components, selected as

they exhibited above-ambient temperature operation, as these are considered more

susceptible to thermally activated degradation mechanisms. Samples of these

components, mainly transistor/IC packages, were extracted from relays with different

service life history.

Figure B-2: Internal Structure of HMOS Microcontroller Identified as Operating above

Ambient Temperature in SHBM Module 16

Particular attention was paid to die attachments and wire bonds. Signs of packaging-

related damage, i.e. die attachment voiding and cracking were observed. For one

component, this appeared to have progressed in a relay with a 17 year service history

when compared to the degradation observed in relay with a 7-8 year history. It is not

possible to say whether the observed damage was present in the as-manufactured

condition, or whether it evolved during operation. Overall, the damage observed in

components was not extensive. Percentage void area beneath die attachments ranged

from 2.6% to 7.1%. Thus, although a gradual degradation in thermal resistance and

electrical performance is expected over time, under the typically benign ambient

environmental conditions and in the absence of significant temperature cycling,

significant acceleration of the observed degradation mechanisms is unlikely. No signs

of bond wire failure were observed.

Appendix

Page | 236

Figure B-3: Metal can packaged voltage regulator (IC11, Modules 21/23/25, SHNB 101)

Appendix

Page | 237

Appendix C: IEEE Reliability Test System

Load Profile

Table C-1: Weekly Peak Load in Percent of Annual Peak

Week Peak Load Week Peak Load

1 86.20% 27 75.50% 2 90.00% 28 81.60% 3 87.80% 29 80.10% 4 83.40% 30 88.00% 5 88.00% 31 72.20%

6 84.10% 32 77.60% 7 83.20% 33 80.00% 8 80.60% 34 72.90% 9 74.00% 35 72.60%

10 73.70% 36 70.50% 11 71.50% 37 78.00% 12 72.70% 38 69.50% 13 70.40% 39 72.40% 14 75.00% 40 72.40% 15 72.10% 41 74.30% 16 80.00% 42 74.40% 17 75.40% 43 80.00%

18 83.70% 44 88.10% 19 87.00% 45 88.50% 20 88.00% 46 90.90% 21 85.60% 47 94.00% 22 81.10% 48 89.00% 23 90.00% 49 94.20% 24 88.70% 50 97.00% 25 89.60% 51 100.00% 26 86.10% 52 95.20%

Table C-2: Daily Load in Percent of Weekly Peak

Daily Load Peak

Monday 93% Tuesday 100%

Wednesday 98% Thursday 96%

Friday 94% Saturday 77% Sunday 75%

Appendix

Page | 238

Table C-3: Hourly Peak Load in Percent of Daily Peak

Winter Weeks summer time spring-fall

1-8 & 44-52 18-30 9-17 & 31-43

Hourly Peak Load

Wkdy Wknd Wkdy Wknd Wkdy Wknd

1 67% 78% 64% 74% 63% 75% 2 63% 72% 60% 70% 62% 73% 3 60% 68% 58% 66% 60% 69% 4 59% 66% 56% 65% 58% 66% 5 59% 64% 56% 64% 59% 65% 6 60% 65% 58% 62% 65% 65% 7 74% 66% 64% 62% 72% 68%

8 86% 70% 76% 66% 85% 74% 9 95% 80% 87% 81% 95% 83%

10 96% 88% 95% 86% 99% 89% 11 96% 90% 99% 91% 100% 92% 12 95% 91% 100% 93% 99% 94% 13 95% 90% 99% 93% 93% 91% 14 95% 88% 100% 92% 92% 90% 15 93% 87% 100% 91% 90% 90% 16 94% 87% 97% 91% 88% 86% 17 99% 91% 96% 92% 90% 85% 18 100% 100% 96% 94% 92% 88% 19 100% 99% 93% 95% 96% 92%

20 96% 97% 92% 95% 98% 100% 21 91% 94% 92% 100% 96% 97% 22 83% 92% 93% 93% 90% 95% 23 73% 87% 87% 88% 80% 90% 24 63% 81% 72% 80% 70% 85%

Appendix

Page | 239

Appendix D: Reliability Assessment Results

for a system with two SIPS

Table D-1: Reliability Assessment Results for SIPS Interaction

Sys States

Arch1

SNG Process/Station Bus

Arch2

Dup Process/ SNG Station Bus

(voting) (vetoing) (voting) (vetoing)

Normal (Level 0)

S1 N&N Pr 9.68×10-1 9.54×10-1 9.81×10-1 9.65×10-1


S2 D&N Pr 3.0×10-2 4.52×10-2 1.78×10-2 5.73×10-2

S3 S&N Fr 1.75×10-6 1.38×10-5 2.07×10-5 1.07×10-5


S4 D&D Pr 7.98×10-4 9.87×10-4 6.56×10-4 1.20×10-3

S5 D&S Pr 5.94×10-6 7.03×10-6 4.07×10-6 6.98×10-6

S6 S&S Fr 4.46×10-7 4.38×10-7 4.53×10-7 4.31×10-7

Sys States

Arch3

SNG process/ Dup Station bus

Arch4

Dup process/Station bus

(voting) (vetoing) (voting) (vetoing)

Normal (Level 0)

S1 N&N Pr 9.80×10-1 9.65×10-1 9.94×10-1 9.53×10-1


S2 D&N Pr 1.85×10-2 3.37×10-2 5.39×10-3 4.60×10-2

S3 S&N Fr 2.05×10-5 1.67×10-5 2.39×10-5 1.36×10-5


S4 D&D Pr 7.03×10-4 8.33×10-4 5.01×10-4 1.00×10-3

S5 D&S Pr 4.21×10-6 6.30×10-6 1.40×10-6 7.06×10-6

S6 S&S Fr 4.52×10-7 4.44×10-7 4.60×10-7 4.37×10-7

reliability assessment of a system integrity protection

Documents