ieee transactions on power systems, vol. xx, no. x, xxx …kaisun/papers/2019-tpwrs_zhang... ·...

12
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1 An Online Search Method for Representative Risky Fault Chains Based on Reinforcement Learning and Knowledge Transfer Zhimei Zhang, Student Member, IEEE, Shaowei Huang, Member, IEEE, Ying Chen, Member, IEEE, Shengwei Mei, Fellow, IEEE, Rui Yao, Member, IEEE, and Kai Sun, Senior Member, IEEE Abstract—In the analysis of cascading outages and blackouts in power systems, risky cascading fault chains should be accurately identified in order to do further block or alleviate blackouts. However, the huge computational burden makes online analysis difficult. In this paper, an online search method for representative risky fault chains based on reinforcement learning and knowledge transfer is proposed. This method aims at promoting efficiency by exploiting similarities of adjacent power flow snapshots in operations. After the “representative risky fault chain” is defined, a framework of tree search based on Markov Decision Process and Q-learning is constructed. The knowledge in past runs is accumulated offline and then applied online, with a mechanism of knowledge transition and extension. The proposed learning based approach is verified on an illustrative 39-bus system with different loading levels, and simulations are carried out on a real- world 1000-bus power grid in China to show the effectiveness and efficiency of the proposed approach. Index Terms—Blackout, cascading outages, fault chain, rein- forcement learning, Markov Decision Process, Q-learning, knowl- edge transfer, tree search. I. I NTRODUCTION A More secure and reliable power system has been pursued by both academic and industrial entities perpetually. However, large-scale blackouts triggered by cascading outages [1]-[4] have been verified as major threats to the society. Therefore, it is important to study the cascading outages and then develop effective strategies that assist in blackout mitigation. A cascading outage refers to a process during which sequen- tial outages of individual components arise, weakening the power system or even collapsing it. Great efforts have been made on the analysis and mitigation of cascading outages. However, due to the low probability, rapid propagation and complex mechanism of cascading failures, the defense against blackout is still a quite challenging task. A fault chain (FC) is a sequence of tripped components during the process of cascading outages [5], which captures Manuscript recieved ... This work is supported in part by National Science Foundation of China under Grant 51621065, and in part by ...(Corresponding author: Shengwei Mei and Rui Yao) Z. Zhang, S. Huang, Y. Chen and S. Mei are with the Department of Electrical Engineering, Tsinghua University, Beijing 100084, China. (e-mail: [email protected], [email protected], chen [email protected], [email protected]) R. Yao is with Argonne National Laboratory, Lemont, IL 60439, USA (e- mail: [email protected]). K. Sun is with the Department of Electrical Engineering and Computer Sciences, University of Tennessee, Knoxville, TN 37996, USA (e-mail: [email protected]). the dependency in the outage propagation process. Blackouts usually do not happen abruptly [6]. In contrast, the failure propagates gradually, which eventually causes severe load loss. For example, the process of the 2003 Northeast America blackout lasted more than 2 hours [1]. So the process of cascading outages can be formulated as FCs. The FCs can be utilized for risk assessment [7]-[9], critical component identification [5], [10], [11], fault blocking [12] and other issues. Therefore, it is desirable to identify the FCs of interest in an accurate and efficient manner. According to the state-of-the-art literature, related models can be classified into simplified ones (high-level statistical or topology-based) and detailed ones (power system model- based). The former group, including CASCADE [13] and branching process [14], aims at fast computation. These mod- els extract some fundamental features of cascading failure, but they are less successful in representing the characteristics of the physical models. The latter group, represented by OPA [15] model and its derivatives [16], retains power flow models or even more detailed modeling of the components and processes. However, it is more computationally intensive and its online application is limited. Various techniques have been proposed to enhance the efficiency for risky FC identification mainly based on detailed models. In [5], an index predicting power flow redistribution is exploited to generate risky FCs. Ref. [17] reports a randomly broad search method with later pruning to yield high-risk N -k contingencies. Ref. [18] claims that historical data should be emphasized. Unnecessary transmission security constraints are eliminated to accelerate computation in [19]. Besides, an optimization-based bi-level method [20] is proposed to identify a series of worst N - k contingencies. Both authors in [8] and [9] highlight the impact of different time scales. In [9], a remarkable synthesized index called risk estimation index (REI) is handled to search for risky chains with priority. While a sequential importance sampling (SIS) based strategy [21] is derived to acquire rare but critical FCs. Ref. [22] proposes an algorithm based on artificial intelligence (AI) that probes to FCs with maximal damages. Even if the above acceleration methods are adopted, the identification of risky FCs is still very time-consuming and is hard to be applied online. When faced with constantly varying power flow snapshots, identifying FCs from scratch on each snapshot cannot meet the requirements of the online application. Fortunately, the state of power systems usually

Upload: others

Post on 08-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1

An Online Search Method for Representative RiskyFault Chains Based on Reinforcement Learning and

Knowledge TransferZhimei Zhang, Student Member, IEEE, Shaowei Huang, Member, IEEE, Ying Chen, Member, IEEE,

Shengwei Mei, Fellow, IEEE, Rui Yao, Member, IEEE, and Kai Sun, Senior Member, IEEE

Abstract—In the analysis of cascading outages and blackouts inpower systems, risky cascading fault chains should be accuratelyidentified in order to do further block or alleviate blackouts.However, the huge computational burden makes online analysisdifficult. In this paper, an online search method for representativerisky fault chains based on reinforcement learning and knowledgetransfer is proposed. This method aims at promoting efficiencyby exploiting similarities of adjacent power flow snapshots inoperations. After the “representative risky fault chain” is defined,a framework of tree search based on Markov Decision Processand Q-learning is constructed. The knowledge in past runs isaccumulated offline and then applied online, with a mechanismof knowledge transition and extension. The proposed learningbased approach is verified on an illustrative 39-bus system withdifferent loading levels, and simulations are carried out on a real-world 1000-bus power grid in China to show the effectiveness andefficiency of the proposed approach.

Index Terms—Blackout, cascading outages, fault chain, rein-forcement learning, Markov Decision Process, Q-learning, knowl-edge transfer, tree search.

I. INTRODUCTION

AMore secure and reliable power system has been pursuedby both academic and industrial entities perpetually.

However, large-scale blackouts triggered by cascading outages[1]-[4] have been verified as major threats to the society.Therefore, it is important to study the cascading outagesand then develop effective strategies that assist in blackoutmitigation.

A cascading outage refers to a process during which sequen-tial outages of individual components arise, weakening thepower system or even collapsing it. Great efforts have beenmade on the analysis and mitigation of cascading outages.However, due to the low probability, rapid propagation andcomplex mechanism of cascading failures, the defense againstblackout is still a quite challenging task.

A fault chain (FC) is a sequence of tripped componentsduring the process of cascading outages [5], which captures

Manuscript recieved ... This work is supported in part by National ScienceFoundation of China under Grant 51621065, and in part by ...(Correspondingauthor: Shengwei Mei and Rui Yao)

Z. Zhang, S. Huang, Y. Chen and S. Mei are with the Departmentof Electrical Engineering, Tsinghua University, Beijing 100084, China.(e-mail: [email protected], [email protected],chen [email protected], [email protected])

R. Yao is with Argonne National Laboratory, Lemont, IL 60439, USA (e-mail: [email protected]). K. Sun is with the Department of ElectricalEngineering and Computer Sciences, University of Tennessee, Knoxville, TN37996, USA (e-mail: [email protected]).

the dependency in the outage propagation process. Blackoutsusually do not happen abruptly [6]. In contrast, the failurepropagates gradually, which eventually causes severe load loss.For example, the process of the 2003 Northeast Americablackout lasted more than 2 hours [1]. So the process ofcascading outages can be formulated as FCs. The FCs canbe utilized for risk assessment [7]-[9], critical componentidentification [5], [10], [11], fault blocking [12] and otherissues. Therefore, it is desirable to identify the FCs of interestin an accurate and efficient manner.

According to the state-of-the-art literature, related modelscan be classified into simplified ones (high-level statisticalor topology-based) and detailed ones (power system model-based). The former group, including CASCADE [13] andbranching process [14], aims at fast computation. These mod-els extract some fundamental features of cascading failure, butthey are less successful in representing the characteristics ofthe physical models. The latter group, represented by OPA [15]model and its derivatives [16], retains power flow models oreven more detailed modeling of the components and processes.However, it is more computationally intensive and its onlineapplication is limited.

Various techniques have been proposed to enhance theefficiency for risky FC identification mainly based on detailedmodels. In [5], an index predicting power flow redistribution isexploited to generate risky FCs. Ref. [17] reports a randomlybroad search method with later pruning to yield high-risk N−kcontingencies. Ref. [18] claims that historical data shouldbe emphasized. Unnecessary transmission security constraintsare eliminated to accelerate computation in [19]. Besides, anoptimization-based bi-level method [20] is proposed to identifya series of worst N − k contingencies. Both authors in [8]and [9] highlight the impact of different time scales. In [9],a remarkable synthesized index called risk estimation index(REI) is handled to search for risky chains with priority. Whilea sequential importance sampling (SIS) based strategy [21] isderived to acquire rare but critical FCs. Ref. [22] proposes analgorithm based on artificial intelligence (AI) that probes toFCs with maximal damages.

Even if the above acceleration methods are adopted, theidentification of risky FCs is still very time-consuming andis hard to be applied online. When faced with constantlyvarying power flow snapshots, identifying FCs from scratchon each snapshot cannot meet the requirements of the onlineapplication. Fortunately, the state of power systems usually

Page 2: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX

1st outage (Ls1)

s1 s2

2nd outage (Ls2)

s0

Initial state

send

Absorbing state

(end)

Outages

(slow process)

Blackout

(fast process)

Fig. 1. Diagram of an FC

varies consecutively and mildly, so the FCs of two adjacentpower flow snapshots may be different but similar. Then theproblem arises: how to exploit this similarity so as to searchfor risky FCs efficiently in the online application scenario?

In this paper, a method of representative risky FC searchbased on reinforcement learning (RL) and knowledge trans-fer is proposed. The method first establishes the knowledgeof each component’s vulnerability of the initial power flowsnapshot from a cascading failure simulator or real data.After initialization, the knowledge is transferred to the powerflow snapshot in the next dispatch cycle so that the searchpreference is swiftly updated. That is, a branch outage is morelikely to be searched if the corresponding component proves tobe riskier in the previous power flow snapshot. Meanwhile, theknowledge is also extended by additional search into unchartedbranches. By utilizing known information from previous anal-ysis instead of starting tabula rasa (from scratch) every time,the proposed method avoids low-risk FCs, which significantlysaves time and thus has the potential to be implemented online.The major contributions of this paper are as follows:

1) A reinforcement-learning-based framework of represen-tative risky FC search is constructed. In the framework, thesearch process is modeled as a Markov Decision Process(MDP). Then a general search procedure based on the MDPis presented, where every FC is associated with a series ofactions, and risky ones have positive values of actions.

2) A search strategy based on reinforcement learning withina power flow snapshot is introduced, including the exploitationand exploration of knowledge during the search. It not onlyhelps find more representative risky FCs but also preparesknowledge for future similar snapshots.

3) The mechanism of knowledge transfer across powerflow snapshots for representative risky FC identification isestablished. Then strategies of transition and extension for ac-celerating computation based on learned previous knowledgeis proposed, which significantly enhances the performance ofsearching under the background of varying power flow.

The rest of this paper is organized as follows. Section IIclarifies the target of the proposed search scheme. Section IIIconstructs the framework of representative risky FC searchbased on reinforcement learning. Section IV presents thesearch strategies of offline initialization and online knowledgetransition-extension. Case studies on two power systems arecarried out in Section V. Finally, concluding remarks are drawnin Section VI.

II. PROBLEM FORMULATION

A. FC and Representative Risky FC

An FC represents the process of a cascading outage. Duringa cascading outage, component outages occurs sequentially, asshown in Fig. 1, then the corresponding FC F is denoted as

F = (Ls1, Ls2, ..., Lsn) (1)

where Ls1, Ls2,..., Lsn are components tripped during thiscascading outage.

According to [8], [9], the process of cascading outages canbe divided into two stages, namely the slow process and fastprocess, which have different time scales. The time intervalof two successive outages within the slow process is usuallylong. While in contrast, many outages occur in a short periodof time during the fast process.

Every time one component is removed, the status of thepower system (power flow of transmission lines, power angleof generators, etc.) is changed. In some severe scenarios,consequent outages may be triggered by the initial outages,due to overloading or instability. Then the system steps intothe fast process, and blackout may eventually occur. In thispaper, the criteria for the fast process are listed as follows:

1) If any load loss occurs; or2) If two or more overloads occur at the same time; or3) If cascading overloads occur one after another.In these three kinds of scenarios, the system is critically

risky and consequent outages may happen soon, thereforestepping into the fast process.

Different FCs may share the same initial outages, so all FCscan be organized into a tree layout, as shown in Fig. 2. Theroot node s0 of the tree represents the initial state with nooutages, which is the starting point of all FCs.

An FC F may end up with a certain level of load loss,denoted as Loss(F ), or without any blackout but the givensearch depth limitation M is reached. If Loss(F ) ≥ TH%,where TH% is a given threshold, the FC F is called a riskyfault chain.

The representativeness is defined as follows. Suppose thereare two risky FCs, namely F1 and F2. If all componentsin F1 are also in F2, and these shared components havethe same sequence in both F1 and F2, then F1 representsF2. In a set of risky FCs, if one of the FCs can representothers, or if they cannot represent each other, but an additionalrisky FC can represent all FCs of the set, then the set issaid to have a representative risky FC. The identification ofrepresentative risky FCs remarkably reduces the computationalburden because only a few FCs need to be identified torepresent a large number of risky FCs.

B. Search for Representative Risky FCs

A good online method should not find out risky FCssimply as many as possible. If an FC (L1, L2) is risky, itcan be expected that other similar FCs such as (L3, L1, L2),(L1, L2, L4), ... are also risky. Because they all have thesame critical parts L1 and L2. In this case, although both L3,L4 and (L1, L2) contribute to the risk of cascading outages,(L1, L2) plays a dominating role. Therefore, in this way, one

Page 3: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

ZHANG et al.: AN ONLINE SEARCH METHOD FOR REPRESENTATIVE RISKY FAULT CHAINS BASED ON REINFORCEMENT LEARNING AND KNOWLEDGE TRANSFER3

s0

s1(3)s1

(1)

s2(2) s2

(3)s2(1)

s1(2)

Fig. 2. Tree layout of FCs

can manufacture a mass of risky FCs far more productivelyafter one risky FC is found, than explore other different FCs.But these similar FCs share the same core, and thus most ofthem are pointless.

To this end, a practical framework should be proposed toidentify representative risky FCs. In this paper, the object ofthe search scheme is to maximize the number of obtainedrepresentative risky FCs in a given number Nmax of searchtrials.

III. FRAMEWORK OF REPRESENTATIVE RISKY FAULTCHAIN SEARCH

A. MDP Model of FC Search

According to [9], [21], the power system has the Markovianproperty, which means that given a state, information includingpower flow, generation re-dispatch and load shedding canbe inferred deterministically by its previous state and thestate transition. In order to find out representative risky FCsactively, this paper introduces a virtual agent (attacker) tosimulate all the factors causing component outages. The agentaims to cause intense load loss through a series of actions(removing components), which can be regarded as a dynamicprogramming problem. Therefore, how the agent will takeactions can be modeled as an MDP.

The MDP consists of a 4-tuple (S, A, T , R), representingthe state, action, probability of state transition, and reward,respectively.

1) State: the state of a system comprises its components’states. For a component, such as a transmission line, a trans-former or a generator, its operating status is modeled as abinary variable which denotes normal operation or tripped.Then, the state of the whole system can be denoted by asequence of tripped components. As different FCs may sharethe same initial outages, all FCs can be organized into a treelayout, shown in Fig. 2.

2) Action: at each step, the agent selects one availablecomponent and removes it, pushing the outages to develop.Although modern power systems conform to N − 1 security,in some severe scenarios, it will be broken, meaning that theagent is able to act multiple times. A component cannot be

removed twice, so it is labeled unavailable after removed.If the system steps into the fast process, the agent will letcascading outages propagate spontaneously. Then the systemarrives at an absorbing state, which has no available action.Therefore, the sequence of removals comes to an end.

Here, the so-called “removals” may be actual physical/cyberattacks conducted by terrorists or just outages caused by hid-den failure, catastrophic weather or other non-human factors.As Ref. [9] figured out, the time interval of two successiveoutages during the slow process is usually long, typically 3-20minutes. So it is reasonable to assume that the agent removescomponents sequentially. That is, an action of removal isassociated with one component, which can reduce the numberof actions considerably.

3) State transition: after an action a, a new state is reachedfrom the previous one s. During the slow process, the removalattempt has a probability γ of success, so the distribution ofthe new state is {

Pr(s′|s, a) = γ

Pr(sfail|s, a) = 1− γ(2)

where s′ is the state which has exactly one more component(the removed component) than s, and sfail is a variant of theabsorbing state. If the system comes into sfail, no new outagewill occur and the cascading outage stops propagation.

However, after the fast process of cascading outages, thenext state is an absorbing state. This transition is stochasticand hard to predict, due to the complicated mechanism ofcascading failures.

4) Reward: the immediate reward R(s′|s, a) is defined asthe instant loss of the transition from the current state s to anew state s′ by action a. In some cases, part of the load mustbe shed to maintain the equilibrium of the power system afterremoval of an element. According to (3), the difference of theload between these two states is called “instant loss”.{

R(s′|s, a) = Load(s)− Load(s′)R(sfail|s, a) = 0

(3)

If the fast process of cascading tripping is triggered, statetransition is random and therefore s′ is not deterministic, sothe reward of this very last removal is calculated according toits expectation

R(sn−1, an) =∑sab

R(sab|sn−1, an)Pr(sab|sn−1, an)

=∑sab

(Load(sn−1)− Load(sab))Pr(sab|sn−1, an)

(4)where n is the length of slow-process FC, sn−1 is the lastslow state before the fast process, an is the very removal thattriggers fast process, sab is a possible absorbing state derivedfrom sn−1.

Suppose the agent will act n times from a certain state s0,then its aggregate reward under a certain strategy π is

V π(s0) =Pr(sπ(s0)|s0, π(s0))(R(sπ(s0)|s0, π(s0)) + V π(sπ(s0)))

+Pr(sfail|s0, π(s0))R(sfail|s0, π(s0))(5)

Page 4: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

4 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX

where π(s) is the action taken by the agent at the state saccording to the strategy π, sπ(s) is the next state after actionπ(s) is successfully taken, and sπt is the t-th state starting froms0 of π. According to (2) and (3), V π(s0) can be expandedrecursively as

V π(s0) =

n∑t=0

γtR(sπt+1|sπt , π(sπt )) (6)

Since the n actions constitute an FC, the aggregate rewardV π(s0) is associated with the loss of the FC.

B. RL-Based Solution

The proposed MDP is solved after every R and V is known.If so, all risky FCs (and including representative risky FCs) canbe acquired by all the terminal R(sn−1, an). Theoretically, thesolution can be achieved by classical dynamic programming(DP) like Bellman iteration, which relies on enumerating. Butit is difficult to solve directly because the number of FCs/statesis numerous.

RL is a branch of machine learning, which concerns how anagent takes optimal actions in an environment to maximize itsreward by self-study [23]. Compared with classical dynamicprogramming, RL is able to approach the optimal strategywithout enumerating all possible branches. Thus it is advisablefor relatively complex and time-varying systems [24], forexample, the representative risky FC search.

In this paper, a technique based on the classical Q-learningis adopted to approach V iteratively and incrementally. Thistype of Q-learning has a clear structure, thus is suitable forsolving the RL problem with MDP. The formation of the Q-learning is a value Q iteration update, using the weightedaverage of the old value and the new information as follows

Q(k+1)(st, at) =(1− α)Q(k)(st, at)+

α(R(st+1|st, at) + γmaxat+1

(Q(k)(st+1, at+1)))

(7)where the weight α ∈ [0, 1] is the learning rate, determin-ing to what extent newly acquired knowledge overrides oldknowledge.

An episode (corresponding to a search trial) of the Q-learning algorithm ends up with an absorbing state. Then theQs of related state-action pairs are updated by a backwardtraverse on the tree to speed up the iterative process, whichis different from the traditional Q-learning method. This isbecause major loss occurs at the fast process (leaf nodes ofthe tree). For any absorbing state sab, there is no availableaction, so Q(sab, a) is always zero. Therefore, if st+1 is anabsorbing state, Q is updated by (8), which is derived from(7):

Q(k+1)(st, at) = (1−α)Q(k)(st, at)+αR(s(k+1)t+1 |st, at) (8)

α =1

N(st, at)(9)

where N(st, at) is the visit count, so the effect of (8) iscalculating the average loss of N(st, at) distinct trials.

If st+1 is not an absorbing state, the learning rate α is 1.Then the Q(st, at) is the highest expected loss caused by at

s0

s1(3)s1

(1)

s2(2) s2

(3)s2(1)

L3

L1 L4

L2L1

L3

L2 L4

L3

s1(2)

L4

L3 L5×

Q=0.075

Q=0 Q=0.15Q=0.13

Q=0.26

Q=0 Q=0 Q=0.24 Q=0.3 Q=0

Fig. 3. An example of states with their Qs (γ = 0.5)

and future actions, as shown in Fig. 3. Since the probabilityand reward can be obtained according to (2) and (3), the valueQ(st, at) can be decided by its descendent states. Thus, Q isupdated by (10), which is also derived from (7):

Q(k+1)(st, at) = R(st+1|st, at)+γmaxat+1

(Q(k+1)(st+1, at+1))

(10)

C. Overall Procedure of Representative Risky FC Search

In order to avoid getting too many non-representative riskyFCs, discovered risky FCs are stored. At a certain step of asearch, suppose the agent has taken some actions sequentially.Before taking the current action, actions having been taken(ai1, ai2, ...) combined with the selected action aik corre-sponds to an FC candidate, which is compared to the storedFCs. If a stored FC can represent the FC candidate, thisaction is also labeled unavailable for future searches, and theagent should select another action. For example, in Fig. 3,L3 is unavailable after (L1, L2) tripped because FC (L1, L2)has been identified as risky. Moreover, an action is labeledunavailable when it makes the system move to a state withoutany available action.

When taking actions, the agent should make tradeoff be-tween exploitation and exploration to avoid falling into localoptimum. So the agent performs exploration with the proba-bility ε1, which is focused on searching unexplored or lessexplored paths. Otherwise, the agent exploits the currently“well-behaved” options with larger Qs. The probability ε1declines gradually as the search proceeds, so the agent is morelikely to explore at the beginning and then tends to exploit itsknowledge.

The overall procedure of RL-based representative risky FCsearch is shown in Algorithm 1. In the algorithm, the agentshould always choose an available action according to prioror learned knowledge. This is crucial to the effectiveness ofthe Q-learning and will be presented in the next Section.

Page 5: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

ZHANG et al.: AN ONLINE SEARCH METHOD FOR REPRESENTATIVE RISKY FAULT CHAINS BASED ON REINFORCEMENT LEARNING AND KNOWLEDGE TRANSFER5

Time

Snapshot1 Snapshot3Snapshot2 Snapshot4

Initialization

Offline training

Transition-Extension

Online application

Transition-Extension

Online application

Transition-Extension

Online application

Online new data Online new data Online new data

Q Q Q Q

Representative risky FCs Representative risky FCs Representative risky FCs

Fig. 4. Illustration of consecutive learning

Algorithm 1 Representative risky FC search based on RLSet initial value Q = 0, visit count N = 0 for all state-action pairsfor each episode (from 1 to Nmax) do

Reset power flow according to the initial state s0for each step of the episode (before fast process) do

Select an available action ai according to...if rand(0,1) ≤ ε1 then

exploration: ... to prior knowledgeelse

exploitation: ... to current Q (learned knowledge)end ifTrip the selected branchN(s, ai)← N(s, ai) + 1Update power flowif fast process or the length of the FC ≥M then

No more actions, evolve until absorbing stateelse

Set s← s′

end ifCalculate R(s, ai)

end forif R > TH% then

Store the discovered risky FCend ifUpdate Q backward according to (8) or (10)Update the availability of actions backwardUpdate (decrease) ε1

end forObtain representative risky FCs from the stored risky FCs

IV. RL-BASED OFFLINE AND ONLINE SEARCHSTRATEGIES

A. Consecutive Learning

The operation status of real-world power systems varieswith time. Therefore, a series of different power flow snapshotscan be extracted. Each of them has a set of representative riskyFCs, which should be identified online individually within theduration of the snapshot.

Considering the gradual changes in two adjacent power flowsnapshots, the knowledge of representative risky FCs can beshared. Therefore, this paper proposes a knowledge transitionprocess, which is inspired by the idea of transfer learning [26].

The process is divided into two parts: offline initialization andconsecutive online transition-extension, as shown in Fig. 4.

The initialization is the acquisition of initial knowledgeof FCs. At the beginning, which FCs of the system arerisky is unknown. Therefore, high-risk states and actions needto be identified from scratch. This step may be very time-consuming but fortunately only needs to be executed once atthe beginning.

Transition-extension is the application and continuous up-date of knowledge. It is performed after new power flowsnapshots are obtained.

The transition prefers to exploit previously learned knowl-edge. Unless large disturbances occur, the difference betweentwo adjacent dispatch snapshots is usually insignificant as thetime interval is short, typically 15 minutes. So risky FCs inthe previous snapshot may also remain risky in the upcomingsnapshot.

In contrast, the extension tends to explore less “valuable”paths, because actions seemingly not dangerous may be ac-tually risky but have not been discovered before. Moreover,because of the variation of power flow, some low-risk FCsmay become risky in the new power flow snapshot. Therefore,moderate additional trials can be helpful complements to thehistorical experience exploitation and are needed to identifymore risky FCs.

After the online refinement, the revised knowledge Q is thenapplied to the next snapshot, which launches a new transition-extension process.

B. Initialization: Search without Previous Knowledge

In initialization, trials are rather random and blind. Theagent in MDP cannot evaluate how much risk is caused byan outage of a transmission line, until the agent removes itand simulate the subsequent events. Therefore, the agent needsto use an index to estimate the severity of each outage andguide the search, which is called prior knowledge. Accordingto experience, the heavier the power flow of a componentis, the higher impact on the grid of its outage is imposed.Therefore, a power-flow-weighted (PF-weighted, or PFW forshort) index WPF based on Softmax function [25] is proposedhere to determine the probability of an action ai

Pr(ai|s) =WPF (Ci|s)∑mj=1WPF (Cj |s)

(11)

Page 6: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

6 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX

WPF (Ci|s) = exp(ks × PF (Ci|s)√N(s, ai) + 1

) (12)

where PF (Ci|s) is component Ci’s power flow under state s,N(s, ai) is the visit count of action ai (removal of Ci) at thecurrent state, and ks is an adjustable positive parameter. Theparameter ks is necessary, because the value of PF (Ci|s) maybe very large to cause arithmetic overflow. In this paper, ks isset to be

ks =lnE

maxi(PF (Ci|s)/√N(s, ai) + 1)

(13)

where E is a constant used to adjust the agent’s preference. Alarger E means that the agent is more likely to remove heavilyloaded components. According to (13), the maximum value ofWPF is E.

The probability ε1 is decided by the power flow and visitcount of the initial state s0

ε1 = max(

∑mi=1 (PF (Ci|s0)/

√N(s0, ai) + 1)∑m

i=1 PF (Ci|s0), ε1m) (14)

where ε1m is used to maintain the minimum level of explo-ration.

If the agent would like to exploit obtained knowledge Q, itwill select components with max weighted Q

ai = argmaxi

Q(s, ai)√N(s, ai) + 1

(15)

C. Transition-Extension: Search Based on Knowledge

After initialization, the knowledge of risky FCs of a certainpower system is stored in Q. When the updated power flowsnapshot is available, the agent can search risky FCs ofthe snapshot based on the previous knowledge and the newknowledge of the new power flow snapshot.

1) Knowledge transition: the selection of actions can beguided by the previous knowledge Q0, which is the Q of thelast snapshot. Therefore, in the transition stage, the probabilityof an action is determined as follows

Pr(ai|s) =WQ0(ai|s)∑mj=1WQ0(aj |s)

(16)

WQ0(ai|s) = exp(Q0(s, ai)√N(s, ai) + 1

)× sgn(Q0(s, ai)) (17)

It is different from (15), because externally learned knowl-edge Q0 is used instead of Q.

2) Knowledge extension: if the agent decides to extend itsknowledge, it takes the same strategy as the initialization (11)-(12). Since the knowledge extension is used online, the limitfor the number of search trials is much fewer than that ofinitialization.

3) Online mixed strategy: to further enhance the searchefficiency, this paper proposes a mixed strategy to balancetransition and extension. Similar to the ε-greedy strategy inAlgorithm 1, the mixed strategy uses another probability ε2 totransfer the previous knowledge Q0: for a search, if a randomnumber rand(0,1) ≤ ε2, components are removed according to(16) by the principle of transition; if not, according to (11)

Algorithm 2 Exploration and exploitation in transition-extension

Select an available action ai according to...if rand(0,1)≤ ε1 then

exploration:if rand(0,1)≤ ε2 then

... to previous knowledge WQ0 (transferred knowledge)else

... to power flow WPF (prior knowledge)end if

elseexploitation: ... to current WQ (newly learned knowledge)

end if...Update (decrease) ε1 and ε2

TABLE ITEST PLATFORM AND SIMULATION PARAMETERS

New England SichuanHardware Intel Xeon E3-1230v3+12GB RAMSoftware MATLAB R2018aPF Type DC Power Flowγ in (7) 0.5

M in Algo. 1 3 6TH% in Algo. 1 1% 0.2%

E in (13) 100 200ε1m in (14) 0.01ε20 in (18) 1.3 0.95

by extension. The ε2 is also decided by the visit count of theinitial state s0

ε2 = min(ε20 ×∑i (Q(s0, ai)/

√N(s0, ai) + 1)∑

iQ(s0, ai), 1) (18)

where ε20 is an initial constant. Like ε1, the ε2 also de-creases gradually, so the agent inclines to transfer the previousknowledge at the early stage, then more likely to extend theknowledge as the number of search trials increases.

The online step actually has three strategies. Firstly, theagent has a probability ε1 of exploration. Then according to ε2,the agent will decide to whether explore according to previousknowledge (16) or power flow (11). The procedure is shownin Algorithm 2.

V. CASE STUDY

In this section, numerical simulations are performed tovalidate the proposed FC search framework based on RL andknowledge transfer. The New England 39-bus test system isused to study the feasibility and effectiveness of the proposedapproach. Then a test case on a real-world provincial gridis used to test the effectiveness of the proposed consecutivelearning. The DC power flow is adopted in this paper tosimulate cascading outages. Other models compatible withAlgorithm 1, including AC power flow model and others men-tioned in [27] are also applicable to the proposed framework.The specifications of the test platform and the simulationparameters are listed in Table I, where the value of TH%is a specific percentage of the total load of the system.

Page 7: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

ZHANG et al.: AN ONLINE SEARCH METHOD FOR REPRESENTATIVE RISKY FAULT CHAINS BASED ON REINFORCEMENT LEARNING AND KNOWLEDGE TRANSFER7

TABLE IICOMPARISON OF FCS

Snapshot Risky FCs Shared FCs Similar FCs Unique FCsS1 3303 3244 50 9S2 3383 12 127

0

0.90

2000

0.92

4000

0.94 1.01

λS1

0.960.98

λS211.01

Num

ber o

f ris

ky F

Cs

10.98

0.960.940.92

0.90

Fig. 5. 2-dimensional comparison of shared chains

A. IEEE 39-Bus New England Test System

There are 39 buses (including 10 generators) and 46branches (including 12 transformers and 34 transmission lines)in the New England test system. In order to verify the feasi-bility, a security screening of N − 2 and N − 3 contingenciesis conducted. As it is a small system, the computational costis affordable.

1) Verification of the similarity of FCs between two similarsnapshots: FCs from two similar but different power flowsnapshots should be firstly compared, because the similarityis the premise of knowledge transfer. In order to illustrate thesimilarity and difference, FCs are compared and then theirsimilarity is classified as follows:

1) Two FCs F1 and F2 are defined as shared FCs if theirdenotations are identical.

2) If they are not shared but F1 is a subset or superset ofF2, they are called similar FCs. Similar FCs are also helpful

0

500

0.900.92

1000

0.94 1.01

λS1

0.960.98

λS211.01

Num

ber o

f ris

ky F

Cs

10.980.960.940.920.90

Fig. 6. 2-dimensional comparison of unique chains (S2 to S1)

TABLE IIISEARCH TRIALS NEEDED TO DISCOVER 95% OF ALL FCS

Strategy Search Trials Risky FCs EfficiencyN1 N2 N2/N1

PFW+RL+T-E 3512

3303

0.940PR+RL+T-E 3542 0.933

REI 6622 0.499PFW+RL 11434 0.289PR+RL 12020 0.275

PFW 11738 0.281PR 12689 0.267

for knowledge transfer. The number of similar FCs of twosnapshots may be different because one FC may be similar tomany FCs of the other snapshot.

3) Otherwise, they are unique FCs to each other. If an FC ofsnapshot S2 is unique to every FC of snapshot S1, it is uniqueto S1. The number of unique FCs may also be unequal.

For example, FC (L1, L2) is similar to both (L1, L2, L3)and (L1, L4, L2), while the latter two are unique to each other.

Two power flow snapshots are constructed and compared.S1 is the base case, and S2 is constructed by amplifying theinjecting power of all buses (generators and substations) ofS1 proportionally to 1.01 times (denoted as 1.01x). The resultis shown in Table II. It can be seen that most of the FCsare shared, verifying the similarity of the two power flowsnapshots and thus the rationality of the knowledge transfer.

The similarity is then verified in a broader range. Sevenpower flow snapshots are constructed in the same way asS2, and the amplification rate λ varies from 0.90x with anincrement of 0.02x. Limited by N − 1 security, the maximumvalue of the amplification rate λmax is 1.013, therefore the lastsnapshot is not 1.02x but 1.01x. Comparisons of 2-dimensionalFC pairs are shown in Fig. 5 and Fig. 6.

The number of risky FCs increases with the growth of λ,which is expected. Snapshots with closer amplification rateλ have more shared risky FCs. So the effect of knowledgetransition may be better if the two power flow snapshotsare more similar. However, no matter how close the twosnapshots are, there are always some different (similar orunique) FCs, which also proves the necessity of knowledgeextension proposed in section IV.

2) Verification of the efficiency of RL and knowledge trans-fer: firstly, the efficiency of the proposed RL mechanism isillustrated, compared to strategies without RL. Two kinds ofprior knowledge are applied. One is purely random search(PR), where the agent always randomly selects an availablecomponent to remove with equal weight. The other one is PF-weighted (PFW) search, where the agent selects componentwith weights that are determined by (11). Both two strategiesare then combined with RL according to Algorithm 1, wherethe agent removes components according to prior or learnedknowledge. Therefore, four strategies are tested and compared,denoted as PR, PFW, PR+RL, PFW+RL, respectively. Thenumbers of search attempts needed to cover 95% of all riskyFCs are shown in Table III.

Methods with RL perform better than those without RL,meaning that they find more risky FCs than their non-RLcounterparts at the same search trials. This is because RL

Page 8: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

8 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX

0 2000 4000 6000 8000 10000 12000Search Trials

0

500

1000

1500

2000

2500

3000N

umbe

r of

ris

ky F

Cs

PFW+RL+T-EREIPFWPR

Fig. 7. Efficiency of transition-extension (base snapshot S1, with knowledgefrom S2)

0 2000 4000 6000 8000 10000 12000Search trials

0

20

40

60

80

100

Acc

umul

ated

ris

k/%

PFW+RL+T-EREIPFWPR

Fig. 8. Accumulated risk discovered by different methods

can continuously use the knowledge accumulated in the searchprocess.

In order to illustrate the power of the proposed transition-extension process, a search strategy called REI from [9] isintroduced. The REI method comprises three kinds of risk,namely risk of system separation, risk of overloading, andsecondary risk. In this method, the agent tends to removebranches with high REI values. In contrast, the proposedtransition-extension process based on RL and PFW (denotedas PFW+RL+T-E) uses the external learned knowledge fromthe heavily loaded power flow snapshot S2 (1.01x). Resultsof the RL+T-E method and three non-RL or T-E ones (REI,PF-weighted, PR) are shown in Table III and Fig. 7.

The proposed PFW+RL+T-E based method performs best.Only about 1/4 trials are needed if this technique is adopted.The sharply rising section of the solid curve refers to thetransition dominated stage, while the nearly horizontal sectionwith flat slope is the extension dominated stage which can findthe omitted minority of risky FCs.

While REI also performs well, it cannot utilize existingknowledge. So the T-E method finds out more risky FCs thanREI does at the same number of search trials. The PFW searchdoes outperform the purely random search at the beginning,but later its rising speed becomes slower. Therefore, the PFWsearch cannot accelerate the search substantially. Although allthe four methods in Fig. 7 can eventually identify all risky FCs,but their efficiency is different. In online application scenarios,

0 2000 4000 6000 8000 10000 12000Search trials

0

500

1000

1500

2000

2500

3000

Num

ber

of r

isky

FC

s

0.96x to 1.01x0.90x to 1.01x0.96x to 0.90x1.01x to 0.90x

Fig. 9. Efficiency of transition-extension process of different combinations

search trials are usually very limited. Under this situation, theadvantage of the T-E based method is more prominent.

In order to quantify the criticality of those found riskychains, the accumulated risk of three different methods arecalculated. The proposed T-E based method not only findsrisky chains faster but also discovers more risk, as shown inFig. 8. After 3512 search trials, the proposed RL+T-E basedmethod has covered 95% of all risky FCs and 95.6% of theaccumulated risk of the system. Meanwhile, the REI methodonly covers 65.8% of accumulated risk at the same searchtrials.

Four more different power flow snapshot pairs are alsotested. Results in Fig. 9 shows that the efficiency of thetransition-extension process is influenced by the differencebetween the adjacent snapshots, which accords with the simi-larity comparison shown in Fig. 5. The nearly but not fully flatslope of 4 curves after turning points highlights the necessityof knowledge extension, especially when the loading level ofthe latter snapshot is higher than the previous snapshot.

B. 1000-Bus Sichuan Grid Model: Consecutive Learning

Different from the New England 39-node test system, thereare backbone buses (500kV) as well as many peripheralbuses with a lower voltage (≤220kV), which make the struc-ture complicated. Sichuan Grid model has 1080 buses and1628 branches. However, not all of them are available ineach dispatch cycle (15 minutes) because of unit start-stopand switching actions. Therefore, the topology of SichuanGrid may vary slightly and the concrete number of busesand branches of each snapshot are different. Meanwhile, thechange in topology is mild and gradual. 923 (85.5%) of thebuses remain unchanged during the day, which guarantees thesimilarity of snapshots of Sichuan Grid. It can be inferredfrom the number that there exists more than 1018 possiblefault combinations, which cannot be enumerated.

The fluctuation of the load (as shown in Fig. 10) variesasynchronously, which differs from the New England test sys-tem. Such a characteristic results in a more irregular tendency.In addition, the loading level of Sichuan Grid model is ratherlow, which means that the FCs able to trigger fast process arelonger but fewer. Therefore, FCs not longer than 6 branches areconcerned in this case. The representativeness of risky FCs is

Page 9: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

ZHANG et al.: AN ONLINE SEARCH METHOD FOR REPRESENTATIVE RISKY FAULT CHAINS BASED ON REINFORCEMENT LEARNING AND KNOWLEDGE TRANSFER9

Load

/p.u

.BQDY

JPFL

200

250

300

350

YB Renewable energy

Total load

00:00 06:00 12:00 18:00 24:00Time

10

20

30

40

50

.

Fig. 10. Power fluctuation of some major buses in Sichuan Grid

Num

ber o

f rep

rese

ntat

ive

risky

FC

s

PFW+RL PFW REI

0 1 2 3 4 5

104

0

20

40

60

80Snapshot 00:00

0 1 2 3 4 5Search trials 104

0

50

100

150

200

Search trialsSnapshot 18:00

Fig. 11. Efficiency of RL in Sichuan Grid

important in Sichuan Grid model. For example, while the PFWmethod finds 397 and 903 risky FCs in snapshot 00:00 and18:00 respectively, only 80 and 181 of them are representative,of which the proportions are both lower than 20%.

1) Efficiency of RL within a single snapshot: in eachdispatch cycle, a power flow snapshot is retrieved from EnergyManagement System (EMS). In this part, two power flowsnapshots of 00:00, and 18:00 are used for test, which repre-sent lightly and heavily loaded scenarios respectively. As forsearch strategies, REI, PFW, and PFW+RL are performed withsearch trials of 50000 times on each of the three snapshots.Numbers of representative risky FCs identified by each methodare shown in Fig. 11. The result of purely random strategy isnot listed because it only finds very few risky FCs in SichuanGrid model.

In this large grid, RL still has a moderate effect of ac-

celeration. However, REI does not perform so well as inthe small New England test system. It is because SichuanGrid model is overall lightly loaded, making the secondaryrisk (the third part) of REI not distinguishable. So in thelightly loaded scenario of 00:00 snapshot, REI identifies evenfewer representative risky FCs than PFW and PFW+RL. Inthe following part, when compared to the knowledge transfermethod, only PFW is concerned.

2) Power of knowledge transfer during online consecutivelearning: 96 real power flow snapshots of Sichuan Grid modelon a certain day are used for online learning. When a newsnapshot is obtained, the online transition-extension learningmethod is applied. Before the first 00:00 snapshot, the offlineinitialization is performed based on the 21:00 snapshot on theprevious day.

The source snapshot is selected mainly because it is heavilyloaded and thus has more risky FCs. A search of 9.6 × 105

times is performed, and then the knowledge is transferredto the 00:00 snapshot. These search trials guarantees thatenough representative risky FC can be found while avoidoverfitting. Every online transition-extension process searches50000 times, which costs only 9-12 minutes and is shorter thanthe duration of a dispatch cycle, conforming to the requirementof online application. If an FC is not transferred successfully,its risk is discounted by multiplying γ, and then is passed tothe next snapshot. For a fair comparison, the control groupsearch each snapshot 60000 times according to (11) (PFWwithout RL or knowledge transfer), as if always treated as theinitialization procedure. As a result, the two methods have thesame amount of total search trials (5.76 × 106) in a day andboth cost about 19 hours. Therefore, the proposed mechanismof RL and knowledge transfer does not substantially affect thecomputational efficiency.

As shown in Fig. 12, the consecutive RL-based learningcan always identify more representative risky FCs than thesearch without RL or T-E. On average, the proposed methodfinds 86.7% more than the control group. Test results alsoshow that T-E finds more and more representative risky FCsthan the control group as the time elapses, which reflects theeffective accumulation of knowledge in the proposed method.During the former half of the day, the T-E finds 103 morerepresentative risky FCs than its rival on average, while thenumber grows to 130 during the afternoon.

Fig. 13 shows the search process of discovered representa-tive risky FCs by the two methods. The proposed T-E approachfirst prefers to transfer learned external knowledge, which isthe major contributor to the effective identification of riskyFCs reflected by the sharp rise of the 3 blue solid curves.Then it turns to extension and exploitation of its own learnedknowledge, resulting in a slower but still lasting increase,which is the same as Fig. 7 and Fig. 9.

3) Discussions on power flow models: To prove that theproposed Algorithm 1 is compatible with AC power flow, apreliminary test is applied to the 00:00 snapshot of SichuanGrid in this part. This test uses AC power flow based on theprocess of physical simulation proposed by [28], of which theresults are shown in Table IV.

According to the results, the time consumed by a search trial

Page 10: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

10 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX

00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 24:00Time

0

100

200

300

400

500

600N

umbe

r of

rep

rese

ntat

ive

risky

FC

sConsecutive T-EFrom scratch

Fig. 12. Efficiency of consecutive learning

0 6

104

0

50

100

150

200

Num

ber o

f rep

rese

ntat

ive

risky

FC

s

Snapshot 02:45

0 6

104

0

200

400

600

1 2 4 53 Search trials

Snapshot 10:45

0 1 2 3 4 5 6Search trials 104

0

100

200

300

400

1 2 4 53 Search trials

Snapshot 18:45

with T-E without T-E

Fig. 13. Details of consecutive learning

TABLE IVCOMPARISON BETWEEN AC AND DC POWER FLOW IN SICHUAN GRID AT

00:00 SNAPSHOT

AC Power Flow DC Power FlowComputational Speed 0.15 0.0144per Search Trial/s

Search Trials 6000 50000Total Computational 15 12Time/min

Representative Risky FCs 83 80

with AC power flow model is 10.4 times that with DC powerflow on average. In order to meet the requirement of the onlineapplication, the computational time for the identification ofonline representative risky FCs should be less than a dispatchcycle. Therefore, fewer search trials are performed. Whensearching 6000 times, the identification of representative riskyFCs under AC power flow model consumes 15 minutes, whichis equal to the dispatch cycle. Meanwhile, these 6000 searchtrials find 83 representative risky FCs combined with theproposed PFW+RL technique, which is close to the numberof DC power flow. Although the number of search trials ofAC power flow is less than that of DC power flow, the ACpower flow model is able to find out risky FCs that cannot beidentified by DC power flow by considering the voltage andreactive power. According to this preliminary result, both ACand DC power flow models are acceptable and practical foronline application. The authors will do further research basedon AC power flow in the future.

VI. CONCLUSION

To identify representative risky fault chains more efficiently,this paper proposes an online search method based on rein-forcement learning and knowledge transfer. The search forfault chains is modeled as a Markov Decision Process, andthen a Q-learning-based method is applied to the search, high-lighting the utilization of historical and current knowledge.After the mechanism of knowledge transition and extensionestablished, strategies for offline training and online applica-tion are presented. Test results of New England test systemand Sichuan Grid model demonstrate the effectiveness andthe efficiency of the proposed approach, which is promisingfor online deployment. Based on the proposed approach, theauthors also plan to exploit the identified fault chains to blockor mitigate cascading outages in future studies.

However, it should be noted that producing fault chains isjust the first step for blackout mitigation. The authors wouldexploit fault chains to block or mitigate cascading outages inthe future.

ACKNOWLEDGMENT

The authors would like to thank Kun Song, Boda Li,Dr. Yue Xia and Dr. Junjian Qi, for their valuable suggestions

Page 11: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

ZHANG et al.: AN ONLINE SEARCH METHOD FOR REPRESENTATIVE RISKY FAULT CHAINS BASED ON REINFORCEMENT LEARNING AND KNOWLEDGE TRANSFER11

during the writing and revision of the paper.

REFERENCES

[1] B. Liscouski, W. Elliot. Final report on the August 14, 2003 blackout inthe United States and Canada: causes and recommendations. A report toUS Department of Energy, Tech. Rep. 2004.

[2] Costas Vournas. Technical summary on the Athens and Southern Greeceblackout of July 12, 2004. National Technical University of Athens, 2004.

[3] A. S. Bakshi, A. Velayutham, S. C. Srivastava, et al. Report of the enquirycommittee on grid disturbance in northern region on 30th July 2012 andin northern, eastern & north-eastern region on 31st July 2012. Ministryof Power Government of India, Tech. Rep. 2012.

[4] H. Guo, C. Zheng, et al. A critical review of cascading failure analysisand modeling of power system. Renewable and Sustainable EnergyReviews, vol. 80, pp. 9-22, 2017.

[5] A. Wang, Y. Luo, G. Tu, and P. Liu. Vulnerability Assessment Scheme forPower System Transmission Networks Based on the Fault Chain Theory,IEEE Transactions on Power Systems, vol. 26, pp. 442-450, 2011.

[6] S. Mei, X. Zhang, and M. Cao. Power grid complexity. pp. 3-4, SpringerScience & Business Media, 2011.

[7] P. Rezaei, P. Hines, and M. J. Eppstein. Estimating Cascading Failure RiskWith Random Chemistry. IEEE Transactions on Power Systems, vol. 30,2015.

[8] P. Henneaux, P. Labeau, J. Maun, and L. Haarla. A Two-Level Probabilis-tic Risk Assessment of Cascading Outages. IEEE Transactions on PowerSystems, vol. 31, pp. 2393-2403, 2016.

[9] R. Yao, S. Huang, K. Sun, F. Liu, X. Zhang, S. Mei, et al. Risk Assessmentof Multi-Timescale Cascading Outages Based on Markovian Tree Search.IEEE Transactions on Power Systems, vol. 32, pp. 2887-2900, 2017.

[10] Z. Zhang, S. Huang, Y. Chen, S. Mei, et al. Key Branches identifica-tion for cascading failure based on q-learning algorithm. 2016 IEEEInternational Conference on Power System Technology (POWERCON),Wollongong, NSW, 2016, pp. 1-6.

[11] X. Wei, J. Zhao, T. Huang, and E. Bompard. A Novel Cascading FaultsGraph Based Transmission Network Vulnerability Assessment Method.IEEE Transactions on Power Systems, vol. 33, pp. 2995-3000, 2017.

[12] R. Yao, K. Sun, F. Liu, and S. Mei. Management of CascadingOutage Risk Based on Risk Gradient and Markovian Tree Search. IEEETransactions on Power Systems, vol. 33, pp. 4050-4060, 2018.

[13] I. Dobson, B. A. Carreras, and D. E. Newman. A probabilistic loading-dependent model of cascading failure and possible implications forblackouts. in Proc. 36th Annu. Hawaii Int. Conf. System Sciences, pp.1-10, 2003.

[14] I. Dobson, B. A. Carreras, and D. E. Newman. A branching processapproximation to cascading load-dependent system failure. in Proc. 37thAnnu. Hawaii Int. Conf. System Sciences, pp. 1-10, 2004.

[15] I. Dobson, B. A. Carreras, V. E. Lynch, and D. E. Newman. An initialmodel for complex dynamics in electric power system blackouts. in Proc.34th Annu. Hawaii Int. Conf. System Sciences, pp. 710-718, 2001.

[16] S. Mei, F. He, X. Zhang, S. Wu, and G. Wang. An improved OPA modeland blackout risk assessment. IEEE Transactions on Power Systems, vol.24, pp. 814-823, 2009.

[17] M. J. Eppstein and P. Hines. A ”Random Chemistry” Algorithm forIdentifying Collections of Multiple Contingencies That Initiate CascadingFailure. IEEE Transactions on Power Systems, vol. 27, pp. 1698-1705,2012.

[18] B. A. Carreras, D. E. Newman and I. Dobson. North American BlackoutTime Series Statistics and Implications for Blackout Risk. IEEE Transac-tions on Power Systems, vol. 31, pp. 4406-4414, 2016.

[19] Y. Yang, X. Guan and Q. Zhai. Fast Grid Security Assessment withN-k Contingencies. IEEE Transactions on Power Systems, vol. 32, pp.2193-2203, 2017.

[20] T. Ding, C. Li, C. Yan, F. Li, and Z. Bie. A Bilevel Optimization Modelfor Risk Assessment and Contingency Ranking in Transmission SystemReliability Evaluation. IEEE Transactions on Power Systems, vol. 32,pp. 3803-3813, 2017.

[21] J. Guo, F. Liu, J. Wang, J. Lin, and S. Mei. Towards Efficient CascadingOutage Simulation and Probability Analysis in Power Systems. IEEETransactions on Power Systems, vol. 33 pp. 2370-2382, 2018.

[22] J. Yan, H. He, X. Zhong, and Y. Tang. Q-Learning-Based VulnerabilityAnalysis of Smart Grid Against Sequential Topology Attacks. IEEETransactions on Information Forensics and Security, vol. 12, pp. 200-210,2017.

[23] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, and A. Huang,et al. Mastering the game of Go without human knowledge. Nature, vol.550, pp. 354-359, 2017.

[24] D. P. Berteskas. Dynamic Programming and Optimal Control, vol. II,4th edition. Athena Scientific, 2012.

[25] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction,2nd edition. pp. 27, MIT Press, 2018.

[26] S. J. Pan, and Q. Yang. A Survey on Transfer Learning. IEEE Trans-actions on Knowledge and Data Engineering, vol. 22, pp. 1345-1359,2010.

[27] P. Henneaux, et al. Benchmarking Quasi-Steady State Cascading OutageAnalysis Methodologies. 2018 IEEE International Conference on Proba-bilistic Methods Applied to Power Systems (PMAPS), Boise, ID, 2018,pp. 1-6.

[28] S. Mei, Y. Ni, G. Wang, and S. Wu. A Study of Self-Organized Criticalityof Power System Under Cascading Failures Based on AC-OPF withVoltage Stability Margin. IEEE Transactions on Power Systems, vol. 23,pp. 1719-1726, 2008.

Zhimei Zhang (S’16) received the B.E. degreein electrical engineering from Tsinghua University,Beijing, China, in 2016, where he is currently work-ing toward the Ph.D. degree in electrical engineering.His research interest includes computational analysisof power systems.

Shaowei Huang received the B.S. and Ph.D. de-grees from the Department of Electrical Engineering,Tsinghua University, Beijing, China, in July 2006and June 2011, respectively. From 2011 to 2013,he worked as a post-doctoral in the Department ofElectrical Engineering, Tsinghua University. He iscurrently an Associate Professor in the Departmentof Electrical Engineering, Tsinghua University. Hisresearch interests include power systems modelingand simulationpower system parallel and distributedcomputing, complex systems and its application in

power systems, artificial intelligence.

Ying Chen (M’07) received the B.E. and Ph.D.degrees in electrical engineering from Tsinghua Uni-versity, Beijing, China, in 2001 and 2006, respec-tively, both in electrical engineering. He is currentlyan associate professor with the Department of Elec-trical Engineering, Tsinghua University. His researchinterests include parallel and distributed computing,electromagnetic transient simulation, cyber-physicalsystem modeling, and cyber security of the smartgrid.

Shengwei Mei (SM’06–F’14) received the B.S.degree in mathematics from Xinjiang University,Urumuqi, China, in 1984, the M.S. degree in op-erations research from Tsinghua University, Beijing,China, in 1989, and the Ph.D. degree in automaticcontrol from the Chinese Academy of Sciences,Beijing, China, in 1996. He is currently a Profes-sor with the Department of Electrical Engineering,Tsinghua University. His research interests includepower system analysis andcontrol, game theory andits application in power systems.

Page 12: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX …kaisun/papers/2019-TPWRS_Zhang... · 2019. 10. 27. · 2 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX 1st

12 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XX, NO. X, XXX 20XX

Rui Yao (S’12–M’17) received the B.S. degree (withdistinction) in 2011 and Ph.D. degree in 2016 inelectrical engineering at Tsinghua University, Bei-jing, China. He was a postdoctoral research associateat the University of Tennessee, Knoxville during2016–2018. He is currently a postdoctoral appointeeat Argonne National Laboratory. His research inter-ests include power system modeling and stabilityanalysis, resilience modeling and assessment, andhigh-performance computational methodologies.

Kai Sun (M’06–SM’13) received the B.S. degree inautomation and the Ph.D. degree in control scienceand engineering from Tsinghua University, Beijing,China, in 1999 and 2004, respectively. He conductedpostdoctoral studies at Arizona State University(2006-2007) and the University of Western Ontario(2005). He was a Project Manager in grid operationsand planning with EPRI, Palo Alto, CA, USA, from2007 to 2012. He is currently an Associate Professorwith the Department of EECS, the University ofTennessee, Knoxville, TN, USA.

His research interests include stability, dynamics and control ofpower grids,and other complex systems. Prof. Sun serves on the Editorial Boards ofthe IEEE Transactions on Smart Grid, IEEE ACCESS, and IET Generation,Transmission and Distribution.