machine learning using version spaces for a power distribution network fault diagnostician

6
Copyright@ IFAC Altificial Intelligence in Real-Time Control, Qelft, The Netherlands, 1992 MACHINE LEARNING USING VERSION SPACES FOR A POWER DISTRIBUTION NETWORK FAULT DIAGNOSTICIAN J. Ypsilantls and H. Yee Department of Electrical Engineering, University of Sydney, NSW2006, Australia Abstract: There has been much interest in the application of expert systems to a wide variety of power system problems. One important application is the diagnosis of electrical faults in power dist.ribution systems. A common problem with expert systems is the 'knowledge acquisition bottleneck' which arises with the generation of rules for the expert system, and there is benefit in automating this procedure as much as possible. This paper presents a fault diagnostician which uses a version space to learn from data in a SCADA system. The end user specifies background knowledge for use by the version space algorithm, but other than this the procedure is automatic. A test system was implemented and evaluated with the aid of a distribution network simulator. The results of this evaluation are presented. Keywords: power distribution, learning systems, supervisory control. INTRODUCTION In recent years, there has been much interest in the use of expert systems to provide assistance to operators in the supervision and control of elec- tric power systems. An area of particular interest concerns the diagnosis of faults in distribution sys- tems. When a fault occurs due to a line clash, lightning strike or a fallen conductor, it must be cleared by protection as quickly as possible. In distribution networks, the protection usually comprises simple overcurrent prevention devices, such as fuses or over current relays coupled with circuit breakers. More elaborate protection schemes incorporating distance relays, intertripping and phase sensitive relays are not often found at the distribution level for reasons of economy. Ideally, when a fault occurs, protection operates to isolate only the faulted line or bus. In practice, it often happens that unnecessary protection opera- tions cause inadvertent loss of supply to otherwise healthy sections of the network, giving rise to the need for rapid diagnosis and subsequent restora- tion. The diagnosis of faults and the determination of protection misoperation are essentially heuristic procedures which may be carried out by an expert system, making use of data available from the su- 115 pervisory control and data acquisition (SCADA) systems used in modern distribution systems. In the application of expert systems to electrical fault diagnosis, a common problem, as with expert systems in general, is the 'knowledge acquisition bottleneck'. A major part of the overall develop- ment and maintenance effort lies in the acquisition and coding of knowledge structures. This is espe- cially the case when knowledge acquisition is car- ried out using conventional methods, e.g. meet- ings and interviews with domain experts. The knowledge acquisition task may be automated using machine learning techniques. Various ma- chine learning algorithms have been devised, many of which induce knowledge from a supplied set of examples. These are particularly useful in the au- tomation of knowledge acquisition for an expert system because a domain expert often finds it eas- ier to cite examples of a concept rather than rules. Modern process control and SCADA systems pro- vide information about the plant being supervised. Plant operators are a further source of informa- tion, either directly or indirectly via the recording of operator actions in response to changes in the condition of the plant. All of this information may be used to create examples for use with a suitable machine learning algorithm. This paper is concerned with the application of

Upload: j-ypsilantis

Post on 02-Jul-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Machine learning using version spaces for a power distribution network fault diagnostician

Copyright@ IFACAltificial Intelligence in Real-TimeControl, Qelft, TheNetherlands, 1992

MACHINE LEARNING USING VERSION SPACES FOR APOWER DISTRIBUTION NETWORK FAULT DIAGNOSTICIAN

J. Ypsilantls and H. YeeDepartment ofElectrical Engineering, University ofSydney, NSW2006, Australia

Abstract: There has been much interest in the application of expert systems to awide variety of power system problems. One important application is the diagnosisof electrical faults in power dist.ribution systems. A common problem with expertsystems is the 'knowledge acquisition bottleneck' which arises with the generation ofrules for the expert system, and there is benefit in automating this procedure as muchas possible. This paper presents a fault diagnostician which uses a version space tolearn from data in a SCADA system. The end user specifies background knowledgefor use by the version space algorithm, but other than this the procedure is automatic.A test system was implemented and evaluated with the aid of a distribution networksimulator. The results of this evaluation are presented.

Keywords: power distribution, learning systems, supervisory control.

INTRODUCTION

In recent years, there has been much interest inthe use of expert systems to provide assistance tooperators in the supervision and control of elec­tric power systems. An area of particular interestconcerns the diagnosis of faults in distribution sys­tems.

When a fault occurs due to a line clash, lightningstrike or a fallen conductor, it must be cleared byprotection as quickly as possible. In distributionnetworks, the protection usually comprises simpleovercurrent prevention devices, such as fuses orovercurrent relays coupled with circuit breakers.More elaborate protection schemes incorporatingdistance relays, intertripping and phase sensitiverelays are not often found at the distribution levelfor reasons of economy.

Ideally, when a fault occurs, protection operates toisolate only the faulted line or bus. In practice, itoften happens that unnecessary protection opera­tions cause inadvertent loss of supply to otherwisehealthy sections of the network, giving rise to theneed for rapid diagnosis and subsequent restora­tion.

The diagnosis of faults and the determination ofprotection misoperation are essentially heuristicprocedures which may be carried out by an expertsystem, making use of data available from the su-

115

pervisory control and data acquisition (SCADA)systems used in modern distribution systems.

In the application of expert systems to electricalfault diagnosis, a common problem, as with expertsystems in general, is the 'knowledge acquisitionbottleneck'. A major part of the overall develop­ment and maintenance effort lies in the acquisitionand coding of knowledge structures. This is espe­cially the case when knowledge acquisition is car­ried out using conventional methods, e.g. meet­ings and interviews with domain experts.

The knowledge acquisition task may be automatedusing machine learning techniques. Various ma­chine learning algorithms have been devised, manyof which induce knowledge from a supplied set ofexamples. These are particularly useful in the au­tomation of knowledge acquisition for an expertsystem because a domain expert often finds it eas­ier to cite examples of a concept rather than rules.

Modern process control and SCADA systems pro­vide information about the plant being supervised.Plant operators are a further source of informa­tion, either directly or indirectly via the recordingof operator actions in response to changes in thecondition of the plant. All of this information maybe used to create examples for use with a suitablemachine learning algorithm.

This paper is concerned with the application of

Page 2: Machine learning using version spaces for a power distribution network fault diagnostician

version spaces to automate the learning process fora distribution system fault diagnostician. The no­tion of a version space was introduced by Mitchell(1982).

Earlier work (Ypsila.ntis (1991)) describes a di­agnosis technique which ha.ndled a class of par­ticular cases using certain features of the post­fault network to discriminate between differentfaults. These features are either recorded by, ormay be calculated from, data available in a typi­cal SCADA. The features used, i.e.,

• The last circuit breaker(s) to operate in re­sponse to the fault, and

• The system islands resulting from protec­tion operation, and their respective states,i.e, live or dead,

were found to be effective and are used in the workdescribed here.

A problem with using only features to discrimi­nate between faults is that an exhaustive trainingphase is required before the diagnostician can per­form well. This results in a requirement for a largenumber of examples. Another problem is that thefeatures in the examples are specific to the faultconditions present at the time of recording. Al­though the examples facilitate good discrimina­tion between faults, they may fail in the presenceof noise.

In the work described in this paper, the above fea­tures are used in the creation of training examplesfor a version space algorithm. A restricted form ofthe version space algorithm is used to simplify therecorded features, the aim being twofold. Firstly,the number of examples stored is reduced becausegeneralisations concerning them are induced. Sec­ondly, the process of generalisation makes the di­agnostician less sensitive to noisy input.

The diagnostician was trained and tested usinga distribution system simulator. The ability ofthe diagnostician to locate faults was evaluatedfor faults generating high and low spurious pro­tection activity, and the degree of generalisationachieved using different sets of background knowl­edge was evaluated. The results obtained indicatethat the version space algorithm can induce diag­nostic rules resulting in a reduction of the num­ber of examples necessary for good diagnosis, andmake the diagnostician less sensitive to noise. Thedata used is that normally available in a SCADAsystem, and the procedure is essentially unsuper­vised.

116

VERSION SPACES

The method of version spaces, as described inMitchell (1982) and in Chapter 7 of Genesereth(1987), is a machine learning algorithm whichmaintains consistency of an evolving concept de­scription using (continually updated) sets of ex­amples and counter examples of the concept. Aversion space is the set of all relations over a. givendomain which describe previously revealed posi­tive examples of a concept while describing noneof the previously revealed negative (or counter)examples of the concept .

Machine learning may be accomplished via use ofversion spaces to create progressively better de­scriptions (a smaller set of relations) as more ex­amples are revealed to the learning system. Theversion space is restricted or pruned with each ex­ample, until the subspace contains a set of rela­tions that can classify the examples with some ac­curacy. If the concept can be clearly defined, theversion space will eventually be pruned to a singlerelation which defines the concept precisely.

For any real problem, a version space will containmany elements. The manipulation of the versionspace can be quite cumbersome. Mitchell (1982)develops a method requiring manipulation of onlythe most specific and most general elements of theversion space, i.e. the boundaries of the versionspace. This reduces the number of elements to bemanipulated to typically two or three.

The accuracy of the resulting boundaries is sen­sitive to ambiguities or noise in the input data.When such data is used, it is possible that theboundaries will not coincide. However, the rulescontained in the boundaries will usually convergearound a subspace of the version space that bestdescribes the examples seen.

The Algorithm Implemented

The formulation of machine learning using versionspace boundaries adopted in this work is basedclosely on the description in Genesereth (1987).Extensions beyond this formulation are:

Constrained variables. In Mitchell's original for­mulation, the non variable attributes were con­strained to a fixed value, e.g. 'red,' rather than atype, e.g. 'colour.' Variable attributes were com­pletely unconstrained. Genesereth and Nilsson'sformulation allows the constraint of a variable toa type.

Rule generation. The version space boundariescan be used to classify test examples. In this im­plementation, classification rules are produced by

Page 3: Machine learning using version spaces for a power distribution network fault diagnostician

listing the specific boundary elements followed bythe general boundary elements. This creates ruleswhich are sorted in increasing generality. If a testexample falls within the classification, it will sat­isfy these rules, otherwise it will not. In the casewhere the concept is not completely defined, i.e.the specific and general boundaries are not iden­tical, there exists a possibility of misclassification.

The initial general boundary element contains un­bound variables. As a result, it will classify anytest example if it is used to produce a rule. Itis necessary to suppress the use of the generalboundary in the generation of rules until at leastone negative example has been revealed, i.e. untilat least one of the variables in each general bound­ary element has been constrained.

utility of the general boundary in a version spacealgorithm.

The general boundary may be used as a checkto determine when a concept is fully described.SCADA data will rarely give rise to a clearly de­fined concept in which case the boundaries willnot become equivalent.

When the general boundary is not used, it is notpossible to estimate how well a concept is definedat a given point. However, only positive examplesare needed for training. General boundaries werenot used in this work, i.e. the version space wasimplemented in generalisation-only form.

TESTING PROCEDURE

Binary relations. The algorithm (in particular thecandidate generation section) was implemented insuch a way as to allow the handling of binary rela­tions as well as unary constraints. This allows thealgorithm to work with more complex constraints.

The use of binary, tern aryl and higher order back­ground knowledge allows the algorithm to discoverrelations between the attributes, rather than sim­ple type constraints on the attributes. An exam­ple of a useful binary relation, in the context ofa distribution system, is the relationship betweenthe direction of current flowing in a line, and thevoltage drop in the line.

Training Set Generation

The simulator described in Teo (1990) was imple­mented under a SCADA MMI interface developedfor a Sun workstation, and was used in the gener­ation of training and test data. A fault diagnosti­cian, developed for the SCADA MMI, was used inthe evaluation of the machine learning algorithm.Figure 1 shows the data and control flow for thetest environment. Figure 2 is the single line dia­gram of the distribution network used in the eval­uation.

DiltribulionNeLWOtXSilnuJa""

~~_'L-..J VerlimSpICe L-'_=''''''''''''Alg.rim

Version space-based learning was applied to theinduction of classification rules from the trainingexamples. Generalisation results in rules whichcover at least two examples. This in turn resultsin a reduction of the number of examples whichneed to be stored and searched during diagnosis,thus speeding up execution.

A training set for the algorithm was formed bysimulation of each fault location on the distri­bution network without relay noise and with allbreakers closed prior to the introduction of thefault. One of three cases resulted from each sim-

Fig. 1: Test environment

SCADA systems contain information which maybe used to generate positive examples, but it isdifficult to generate negative ones. This limits the

IThe implementation of the algorithm in this work wasrestricted to binary background relations. However, it ispossible to cast a given problem, where ternary and higherorder relations are to be learned, into one requiring onlybinary relations.

Multiple concepts. In general, a learning diagnos­tician must accomodate several concepts. In thisapplication, each diagnosis is treated as a poten­tially separate concept requiring a separate ver­sion space, and the implementation allows the au­tomatic generation of extra version spaces as nec­essary. A new specific boundary is created whengeneralisation against at least one of the existingboundaries is not possible.

Characteristics of SCADA data

Some of the version spaces will not be generalised,i.e. they will simply contain the training examplesused to initialise the specific boundaries. This en­sures that any examples which cannot be incorpo­rated into a rule will not be lost after training, i.e.all of the information in the original training setis retained.

117

Page 4: Machine learning using version spaces for a power distribution network fault diagnostician

11

II

Fig. 2: Distribution network

ulation:

1. The diagnostician could not offer a diagno­sis. The example was added to the trainingset.

2. The diagnostician successfully recognisedthe fault, using a previously learned train­ing example. This would occur, for exam­ple, for faults at either end of an electricallyshort line. In this event, the example wasdiscarded.

3. The diagnostician presented an incorrect di­agnosis using a previously learned trainingexample, i.e. another fault, electrically closeto the one simulated, had the same features.The example was added to the training set.

Followingformation of the training set, each faultwas again simulated and it was verified that thediagnostician provided correct diagnoses, i.e., itprovided a list of hypotheses which included thecorrect fault location and in which the alterna­tive hypotheses were electrically close to the ac­tual fault simulated. 'Electrically close' was takento be no further than two buses away from theactual fault.

The training set so formed comprised only positiveexamples of faults. These were presented to theversion space boundary algorithm with the mod­ifications described above. A series of tests wasperformed in which the version space boundary al­gorithm was used to generalise the set of trainingexamples. The set of training examples was inputto the learning algorithm for several different setsof background knowledge.

Background Knowledge

A set of background rules was devised for thisproblem. This comprised five rules, a to e de­scribing:

118

• a. a dead bus adjacent to the last breakerto open,

• b. a line on which last breaker operated,

• c. a line which became fully isolated,

• d. a dead bus connected by a bus coupler toanother bus, and

• e. an isolated line on the periphery of a deadisland.

These background rules were used to create foursets of rules, referred to as Sets 1 through 4. Ta­ble 1 indicates the background rules present ineach set. Set 1 is a subset of Set 2, which is asubset of Set 3 etc. This resulted in a range ofgeneralities in the background knowledge.

In addition, in various parts of the testing, theoriginal training examples were used as a 'control'case to allow comparison with Sets 1 to 4.

TABLE 1. Background Knowledge Sets

Set Generality RulesPresent

1 Most Specific a, b2 a, b, c3 a, b, c, d4 Most General a, b, c, d, e

RESULTS

The evaluation of the machine learning techniqueconcentrated on three areas:

• The effectiveness of the learning algorithm.

• The ability to reject irrelevant backgroundinformation.

• The performance of the induced knowledgewhen noise is introduced in the input.

Each fault was simulated, and the diagnosticianwas tested using the results of learning. The cri­terion for correctness of diagnosis described abovewas used here as well.

Rejection of Irrelevant Knowledge

Initial learning was made separately with eachof the background knowledge sets to determinewhich types of background knowledge were notused in generalisation. Each resulting knowledgebase was examined manually.

It was found that Set 1 and Set 2 produced dis­tinct knowledge bases, with a greater degree ofgeneralisation in the case of Set 2. Generalisationusing Set 1 background knowledge resulted in 23

Page 5: Machine learning using version spaces for a power distribution network fault diagnostician

TABLE 2. Effectiveness of Learning

DISCUSSION

'8': success, 'NM': near miss, 'F': failure or no diagno­sis

Table 3 summarises the results of tests undernoise.

None 51 0 0 approx 11 48 0 3 1.632 45 0 6 2.33

Fault locations resulting in various fault clearingmechanisms were chosen for the testing of the di­agnostician. Three line fault and three bus faultlocations were chosen. For each group of three,one was typical, one caused much breaker activ­ity under no-noise conditions, and one resulted inisolation of an island around the fault.

~ Set~ Av. No. ~S NM F Firing

Noise was simulated as a random error in the nom­inal trip time and pickup current settings of therelays, as described in Ypsilantis (1991). The er­ror was set up to 2% either side of the nominalvalue. This simulated a realistic level of noise andtiming error.

The knowledge bases from Sets 1 and 2 resulted infair performance, however. In both cases, a signif­icant number of successful diagnoses were made,with a similar number of near misses.

It may be seen that the level of noise has an ad­verse effect on the performance of the system whenthe original training set is used, i.e. without gener­alisation. The noise level, although small, resultedin a relatively severe disruption of the diagnosti­cian's performance.

Ten simulations were made at each fault location,for each distinct set of rules . The resulting diag­noses were classified as successful, near misses orfailures as before. In addition, the original train­ing set was tested under the same conditions for acomparison.

Ther e was a significant rise in the number of fail­ures in the case of Set 2 as compared to Set 1,i.e, there is a greater chance of misdiagnosis withincreasing generality.

training examples being left in their original form,with 2 rules describing the remainder. In the caseof Set 2, 15 training examples were left in theiroriginal form, and 3 rules described the rest.

Effectiveness of Learning

Set 3 and Set 4 resulted in the same knowledgebase as for Set 2, indicating that the additionalbackground knowledge added by these sets wasnot found to be relevant, i.e, a dead bus with acoupler, and a line on the edge of a dead island .

Further tests used the knowledge bases producedwith Sets 1 and 2.

Table 2 summarises the results of this test.

The classification was made based on the list ofhypothesised fault locations that the diagnosticianproduced in response to a simulation of the fault.A diagnosis was considered successful if the list ofhypotheses included the faul ted element and anyother hypotheses were for faults no furthe r thantwo buses away from the actual fault . A near misswas taken to be the case where each hypothesiswas no further than two buses away from the faultbut the actual fault was not included . A faileddiagnosis was any case where the list of hypothesesincluded at least one fault location more than twobuses away from the actual fault.

In all cases, there was either a successful diagnosis,or a failure; there were no near misses . All of thefailures occurred in the diagnosis of faults in thevicinity of the infeed buses .

Each distinct knowledge base was used in a subse­quent simulation of the original faults, again with­out relay noise. The resulting diagnosis was, ineach case, classified as one of the following:

• Successful diagnosis .

• A near miss.

• Failure or no diagnosis.

It may be seen that the effectiveness of the knowl­edge base falls as the level of generalisation in­creases. This is expected, since the number ofrules or training examples that fire increases withincreasing generality, and thus the chance of mis­diagnosis increases . This may clearly be seen fromthe average number of rules or training examplesthat fire .

Effects of Relay Noise

A series of tests to assess the performance of thediagnostician in the presence of noisy data, wasconducted using the same knowledge bases.

It was found that several training examples ap­plied to other faults that were electrically close tothe original one used in training. The original 51fault locations resulted in 40 training examples.

In each of the cases where a rule was induced,

119

Page 6: Machine learning using version spaces for a power distribution network fault diagnostician

TABLE 3. Effect of Noise

nSet _~I Av. No. l1~ S NM F Firing ~

None 19 2 39 0.351 32 28 0 1.022 28 22 10 1.73

'8': success, 'NM': near mISS, 'F': failure or no diagno­sis

the relation found comprised exactly one back­ground knowledge element. In other words, thebackground knowledge needed for effective usecontained sufficient detail that no further back­ground knowledge was required to characterise theexamples. There were no rules that included aconjunction of two or more background elements.The implies, at least for this application, that theknowledge engineer indirectly solves the problemin specifying the background knowledge. Thismay not be completely disadvantageous. Whilethe background knowledge needs to be detailed,knowledge engineers need not concern themselveswith the problem of relevance of the backgroundknowledge, since the algorithm will determine thisautomatically.

As is evident in these results, greater generalisa­tion may often be realised when more backgroundknowledge is included. This results in greater'compression' of the training examples, but atthe same time increases the risk of misdiagnosis.In implementing such a learning algorithm, caremust be taken to ensure that a, good tradeoff be­tween these conflicting features is found.

Finally, because the version space algorithm em­bodies the majority of generalise/specialise ma­chine learning procedures, it would be reason­able to expect similar behaviour for any gener­alise/specialise algorithm. It is interesting to notethat data intensive machine learning algorithmssuch as ID3 due to Quinlan (1986), rather thanknowledge intensive ones, seem to be successfulfor problems in the field. Version spaces would beexpected to perform well in much more supervisedand controlled learning environments.

CONCLUSION

This paper presents a fault diagnostician whichproduces rules using a version space algorithm.The user only needs to specify a set of backgroundrules for the version space to use. Learning is oth­erwise automatic.

The version space algorithm is useful in automati­cally determining relevant background knowledge.It is possible to implement a generic diagnostician

120

which contains all possible background rules fora SCADA task. The version space algorithm willonly use background knowledge which is relevantto the task.

The version space algorithm produces knowledgewhich is less sensitive to noise, and it reduces thesize of the knowledge base.

ACKNOWLEDGEMENTS

This work was supported by an Australian Post­graduate Research Award and an Australian Elec­trical Supply Industry Research Board grant. Theauthors wish to thank Associate Professor TeoCheng-Yu, of the School of Electrical and Elec­tronic Engineering, Nanyang Technological Uni­versity, Singapore, for use of the distribution sys­tem data.

REFERENCES

M. R. Genesereth and N. J. Nilsson (1987).Logical Foundations of Artificial Intelligence.Morgan Kaufman, 1987.

T. M. Mitchell (1982). Generalization as search.Artificial Intelligence, 18, 203-226, 1982.

J. R. Quinlan (1986). Induction of decision trees.Machine Learning, 1, 81-106, 1986.

C. Y. Teo and T. W. Chan (1990). Develop­ment of computer-aided assessment for distri­bution protection. Power Engineering Journal, i,21-28, January 1990.

J. Ypsilantis, H. Yee, and C-Y Teo (1991). Anadaptive, rule-based fault diagnostician for powerdistribution networks. March 1991. Submitted tolEE Proceedings Part C.