machine learning using version spaces for a power distribution network fault diagnostician

Copyright@ IFACAltificial Intelligence in Real-TimeControl, Qelft, TheNetherlands, 1992

MACHINE LEARNING USING VERSION SPACES FOR APOWER DISTRIBUTION NETWORK FAULT DIAGNOSTICIAN

J. Ypsilantls and H. YeeDepartment ofElectrical Engineering, University ofSydney, NSW2006, Australia

Abstract: There has been much interest in the application of expert systems to awide variety of power system problems. One important application is the diagnosisof electrical faults in power dist.ribution systems. A common problem with expertsystems is the 'knowledge acquisition bottleneck' which arises with the generation ofrules for the expert system, and there is benefit in automating this procedure as muchas possible. This paper presents a fault diagnostician which uses a version space tolearn from data in a SCADA system. The end user specifies background knowledgefor use by the version space algorithm, but other than this the procedure is automatic.A test system was implemented and evaluated with the aid of a distribution networksimulator. The results of this evaluation are presented.

Keywords: power distribution, learning systems, supervisory control.

INTRODUCTION

In recent years, there has been much interest inthe use of expert systems to provide assistance tooperators in the supervision and control of electric power systems. An area of particular interestconcerns the diagnosis of faults in distribution systems.

When a fault occurs due to a line clash, lightningstrike or a fallen conductor, it must be cleared byprotection as quickly as possible. In distributionnetworks, the protection usually comprises simpleovercurrent prevention devices, such as fuses orovercurrent relays coupled with circuit breakers.More elaborate protection schemes incorporatingdistance relays, intertripping and phase sensitiverelays are not often found at the distribution levelfor reasons of economy.

Ideally, when a fault occurs, protection operates toisolate only the faulted line or bus. In practice, itoften happens that unnecessary protection operations cause inadvertent loss of supply to otherwisehealthy sections of the network, giving rise to theneed for rapid diagnosis and subsequent restoration.

The diagnosis of faults and the determination ofprotection misoperation are essentially heuristicprocedures which may be carried out by an expertsystem, making use of data available from the su-

115

pervisory control and data acquisition (SCADA)systems used in modern distribution systems.

In the application of expert systems to electricalfault diagnosis, a common problem, as with expertsystems in general, is the 'knowledge acquisitionbottleneck'. A major part of the overall development and maintenance effort lies in the acquisitionand coding of knowledge structures. This is especially the case when knowledge acquisition is carried out using conventional methods, e.g. meetings and interviews with domain experts.

The knowledge acquisition task may be automatedusing machine learning techniques. Various machine learning algorithms have been devised, manyof which induce knowledge from a supplied set ofexamples. These are particularly useful in the automation of knowledge acquisition for an expertsystem because a domain expert often finds it easier to cite examples of a concept rather than rules.

Modern process control and SCADA systems provide information about the plant being supervised.Plant operators are a further source of information, either directly or indirectly via the recordingof operator actions in response to changes in thecondition of the plant. All of this information maybe used to create examples for use with a suitablemachine learning algorithm.

This paper is concerned with the application of

version spaces to automate the learning process fora distribution system fault diagnostician. The notion of a version space was introduced by Mitchell(1982).

Earlier work (Ypsila.ntis (1991)) describes a diagnosis technique which ha.ndled a class of particular cases using certain features of the postfault network to discriminate between differentfaults. These features are either recorded by, ormay be calculated from, data available in a typical SCADA. The features used, i.e.,

• The last circuit breaker(s) to operate in response to the fault, and

• The system islands resulting from protection operation, and their respective states,i.e, live or dead,

were found to be effective and are used in the workdescribed here.

A problem with using only features to discriminate between faults is that an exhaustive trainingphase is required before the diagnostician can perform well. This results in a requirement for a largenumber of examples. Another problem is that thefeatures in the examples are specific to the faultconditions present at the time of recording. Although the examples facilitate good discrimination between faults, they may fail in the presenceof noise.

In the work described in this paper, the above features are used in the creation of training examplesfor a version space algorithm. A restricted form ofthe version space algorithm is used to simplify therecorded features, the aim being twofold. Firstly,the number of examples stored is reduced becausegeneralisations concerning them are induced. Secondly, the process of generalisation makes the diagnostician less sensitive to noisy input.

The diagnostician was trained and tested usinga distribution system simulator. The ability ofthe diagnostician to locate faults was evaluatedfor faults generating high and low spurious protection activity, and the degree of generalisationachieved using different sets of background knowledge was evaluated. The results obtained indicatethat the version space algorithm can induce diagnostic rules resulting in a reduction of the number of examples necessary for good diagnosis, andmake the diagnostician less sensitive to noise. Thedata used is that normally available in a SCADAsystem, and the procedure is essentially unsupervised.

116

VERSION SPACES

The method of version spaces, as described inMitchell (1982) and in Chapter 7 of Genesereth(1987), is a machine learning algorithm whichmaintains consistency of an evolving concept description using (continually updated) sets of examples and counter examples of the concept. Aversion space is the set of all relations over a. givendomain which describe previously revealed positive examples of a concept while describing noneof the previously revealed negative (or counter)examples of the concept .

Machine learning may be accomplished via use ofversion spaces to create progressively better descriptions (a smaller set of relations) as more examples are revealed to the learning system. Theversion space is restricted or pruned with each example, until the subspace contains a set of relations that can classify the examples with some accuracy. If the concept can be clearly defined, theversion space will eventually be pruned to a singlerelation which defines the concept precisely.

For any real problem, a version space will containmany elements. The manipulation of the versionspace can be quite cumbersome. Mitchell (1982)develops a method requiring manipulation of onlythe most specific and most general elements of theversion space, i.e. the boundaries of the versionspace. This reduces the number of elements to bemanipulated to typically two or three.

The accuracy of the resulting boundaries is sensitive to ambiguities or noise in the input data.When such data is used, it is possible that theboundaries will not coincide. However, the rulescontained in the boundaries will usually convergearound a subspace of the version space that bestdescribes the examples seen.

The Algorithm Implemented

The formulation of machine learning using versionspace boundaries adopted in this work is basedclosely on the description in Genesereth (1987).Extensions beyond this formulation are:

Constrained variables. In Mitchell's original formulation, the non variable attributes were constrained to a fixed value, e.g. 'red,' rather than atype, e.g. 'colour.' Variable attributes were completely unconstrained. Genesereth and Nilsson'sformulation allows the constraint of a variable toa type.

Rule generation. The version space boundariescan be used to classify test examples. In this implementation, classification rules are produced by

listing the specific boundary elements followed bythe general boundary elements. This creates ruleswhich are sorted in increasing generality. If a testexample falls within the classification, it will satisfy these rules, otherwise it will not. In the casewhere the concept is not completely defined, i.e.the specific and general boundaries are not identical, there exists a possibility of misclassification.

The initial general boundary element contains unbound variables. As a result, it will classify anytest example if it is used to produce a rule. Itis necessary to suppress the use of the generalboundary in the generation of rules until at leastone negative example has been revealed, i.e. untilat least one of the variables in each general boundary element has been constrained.

utility of the general boundary in a version spacealgorithm.

The general boundary may be used as a checkto determine when a concept is fully described.SCADA data will rarely give rise to a clearly defined concept in which case the boundaries willnot become equivalent.

When the general boundary is not used, it is notpossible to estimate how well a concept is definedat a given point. However, only positive examplesare needed for training. General boundaries werenot used in this work, i.e. the version space wasimplemented in generalisation-only form.

TESTING PROCEDURE

Binary relations. The algorithm (in particular thecandidate generation section) was implemented insuch a way as to allow the handling of binary relations as well as unary constraints. This allows thealgorithm to work with more complex constraints.

The use of binary, tern aryl and higher order background knowledge allows the algorithm to discoverrelations between the attributes, rather than simple type constraints on the attributes. An example of a useful binary relation, in the context ofa distribution system, is the relationship betweenthe direction of current flowing in a line, and thevoltage drop in the line.

Training Set Generation

The simulator described in Teo (1990) was implemented under a SCADA MMI interface developedfor a Sun workstation, and was used in the generation of training and test data. A fault diagnostician, developed for the SCADA MMI, was used inthe evaluation of the machine learning algorithm.Figure 1 shows the data and control flow for thetest environment. Figure 2 is the single line diagram of the distribution network used in the evaluation.

DiltribulionNeLWOtXSilnuJa""

~~_'L-..J VerlimSpICe L-'_=''''''''''''Alg.rim

Version space-based learning was applied to theinduction of classification rules from the trainingexamples. Generalisation results in rules whichcover at least two examples. This in turn resultsin a reduction of the number of examples whichneed to be stored and searched during diagnosis,thus speeding up execution.

A training set for the algorithm was formed bysimulation of each fault location on the distribution network without relay noise and with allbreakers closed prior to the introduction of thefault. One of three cases resulted from each sim-

Fig. 1: Test environment

SCADA systems contain information which maybe used to generate positive examples, but it isdifficult to generate negative ones. This limits the

IThe implementation of the algorithm in this work wasrestricted to binary background relations. However, it ispossible to cast a given problem, where ternary and higherorder relations are to be learned, into one requiring onlybinary relations.

Multiple concepts. In general, a learning diagnostician must accomodate several concepts. In thisapplication, each diagnosis is treated as a potentially separate concept requiring a separate version space, and the implementation allows the automatic generation of extra version spaces as necessary. A new specific boundary is created whengeneralisation against at least one of the existingboundaries is not possible.

Characteristics of SCADA data

Some of the version spaces will not be generalised,i.e. they will simply contain the training examplesused to initialise the specific boundaries. This ensures that any examples which cannot be incorporated into a rule will not be lost after training, i.e.all of the information in the original training setis retained.

117

11

II

Fig. 2: Distribution network

ulation:

1. The diagnostician could not offer a diagnosis. The example was added to the trainingset.

2. The diagnostician successfully recognisedthe fault, using a previously learned training example. This would occur, for example, for faults at either end of an electricallyshort line. In this event, the example wasdiscarded.

3. The diagnostician presented an incorrect diagnosis using a previously learned trainingexample, i.e. another fault, electrically closeto the one simulated, had the same features.The example was added to the training set.

Followingformation of the training set, each faultwas again simulated and it was verified that thediagnostician provided correct diagnoses, i.e., itprovided a list of hypotheses which included thecorrect fault location and in which the alternative hypotheses were electrically close to the actual fault simulated. 'Electrically close' was takento be no further than two buses away from theactual fault.

The training set so formed comprised only positiveexamples of faults. These were presented to theversion space boundary algorithm with the modifications described above. A series of tests wasperformed in which the version space boundary algorithm was used to generalise the set of trainingexamples. The set of training examples was inputto the learning algorithm for several different setsof background knowledge.

Background Knowledge

A set of background rules was devised for thisproblem. This comprised five rules, a to e describing:

118

• a. a dead bus adjacent to the last breakerto open,

• b. a line on which last breaker operated,

• c. a line which became fully isolated,

• d. a dead bus connected by a bus coupler toanother bus, and

• e. an isolated line on the periphery of a deadisland.

These background rules were used to create foursets of rules, referred to as Sets 1 through 4. Table 1 indicates the background rules present ineach set. Set 1 is a subset of Set 2, which is asubset of Set 3 etc. This resulted in a range ofgeneralities in the background knowledge.

In addition, in various parts of the testing, theoriginal training examples were used as a 'control'case to allow comparison with Sets 1 to 4.

TABLE 1. Background Knowledge Sets

Set Generality RulesPresent

1 Most Specific a, b2 a, b, c3 a, b, c, d4 Most General a, b, c, d, e

RESULTS

The evaluation of the machine learning techniqueconcentrated on three areas:

• The effectiveness of the learning algorithm.

• The ability to reject irrelevant backgroundinformation.

• The performance of the induced knowledgewhen noise is introduced in the input.

Each fault was simulated, and the diagnosticianwas tested using the results of learning. The criterion for correctness of diagnosis described abovewas used here as well.

Rejection of Irrelevant Knowledge

Initial learning was made separately with eachof the background knowledge sets to determinewhich types of background knowledge were notused in generalisation. Each resulting knowledgebase was examined manually.

It was found that Set 1 and Set 2 produced distinct knowledge bases, with a greater degree ofgeneralisation in the case of Set 2. Generalisationusing Set 1 background knowledge resulted in 23

TABLE 2. Effectiveness of Learning

DISCUSSION

'8': success, 'NM': near miss, 'F': failure or no diagnosis

Table 3 summarises the results of tests undernoise.

None 51 0 0 approx 11 48 0 3 1.632 45 0 6 2.33

Fault locations resulting in various fault clearingmechanisms were chosen for the testing of the diagnostician. Three line fault and three bus faultlocations were chosen. For each group of three,one was typical, one caused much breaker activity under no-noise conditions, and one resulted inisolation of an island around the fault.

~ Set~ Av. No. ~S NM F Firing

Noise was simulated as a random error in the nominal trip time and pickup current settings of therelays, as described in Ypsilantis (1991). The error was set up to 2% either side of the nominalvalue. This simulated a realistic level of noise andtiming error.

The knowledge bases from Sets 1 and 2 resulted infair performance, however. In both cases, a significant number of successful diagnoses were made,with a similar number of near misses.

It may be seen that the level of noise has an adverse effect on the performance of the system whenthe original training set is used, i.e. without generalisation. The noise level, although small, resultedin a relatively severe disruption of the diagnostician's performance.

Ten simulations were made at each fault location,for each distinct set of rules . The resulting diagnoses were classified as successful, near misses orfailures as before. In addition, the original training set was tested under the same conditions for acomparison.

Ther e was a significant rise in the number of failures in the case of Set 2 as compared to Set 1,i.e, there is a greater chance of misdiagnosis withincreasing generality.

training examples being left in their original form,with 2 rules describing the remainder. In the caseof Set 2, 15 training examples were left in theiroriginal form, and 3 rules described the rest.

Effectiveness of Learning

Set 3 and Set 4 resulted in the same knowledgebase as for Set 2, indicating that the additionalbackground knowledge added by these sets wasnot found to be relevant, i.e, a dead bus with acoupler, and a line on the edge of a dead island .

Further tests used the knowledge bases producedwith Sets 1 and 2.

Table 2 summarises the results of this test.

The classification was made based on the list ofhypothesised fault locations that the diagnosticianproduced in response to a simulation of the fault.A diagnosis was considered successful if the list ofhypotheses included the faul ted element and anyother hypotheses were for faults no furthe r thantwo buses away from the actual fault . A near misswas taken to be the case where each hypothesiswas no further than two buses away from the faultbut the actual fault was not included . A faileddiagnosis was any case where the list of hypothesesincluded at least one fault location more than twobuses away from the actual fault.

In all cases, there was either a successful diagnosis,or a failure; there were no near misses . All of thefailures occurred in the diagnosis of faults in thevicinity of the infeed buses .

Each distinct knowledge base was used in a subsequent simulation of the original faults, again without relay noise. The resulting diagnosis was, ineach case, classified as one of the following:

• Successful diagnosis .

• A near miss.

• Failure or no diagnosis.

It may be seen that the effectiveness of the knowledge base falls as the level of generalisation increases. This is expected, since the number ofrules or training examples that fire increases withincreasing generality, and thus the chance of misdiagnosis increases . This may clearly be seen fromthe average number of rules or training examplesthat fire .

Effects of Relay Noise

A series of tests to assess the performance of thediagnostician in the presence of noisy data, wasconducted using the same knowledge bases.

It was found that several training examples applied to other faults that were electrically close tothe original one used in training. The original 51fault locations resulted in 40 training examples.

In each of the cases where a rule was induced,

119

TABLE 3. Effect of Noise

nSet _~I Av. No. l1~ S NM F Firing ~

None 19 2 39 0.351 32 28 0 1.022 28 22 10 1.73

'8': success, 'NM': near mISS, 'F': failure or no diagnosis

the relation found comprised exactly one background knowledge element. In other words, thebackground knowledge needed for effective usecontained sufficient detail that no further background knowledge was required to characterise theexamples. There were no rules that included aconjunction of two or more background elements.The implies, at least for this application, that theknowledge engineer indirectly solves the problemin specifying the background knowledge. Thismay not be completely disadvantageous. Whilethe background knowledge needs to be detailed,knowledge engineers need not concern themselveswith the problem of relevance of the backgroundknowledge, since the algorithm will determine thisautomatically.

As is evident in these results, greater generalisation may often be realised when more backgroundknowledge is included. This results in greater'compression' of the training examples, but atthe same time increases the risk of misdiagnosis.In implementing such a learning algorithm, caremust be taken to ensure that a, good tradeoff between these conflicting features is found.

Finally, because the version space algorithm embodies the majority of generalise/specialise machine learning procedures, it would be reasonable to expect similar behaviour for any generalise/specialise algorithm. It is interesting to notethat data intensive machine learning algorithmssuch as ID3 due to Quinlan (1986), rather thanknowledge intensive ones, seem to be successfulfor problems in the field. Version spaces would beexpected to perform well in much more supervisedand controlled learning environments.

CONCLUSION

This paper presents a fault diagnostician whichproduces rules using a version space algorithm.The user only needs to specify a set of backgroundrules for the version space to use. Learning is otherwise automatic.

The version space algorithm is useful in automatically determining relevant background knowledge.It is possible to implement a generic diagnostician

120

which contains all possible background rules fora SCADA task. The version space algorithm willonly use background knowledge which is relevantto the task.

The version space algorithm produces knowledgewhich is less sensitive to noise, and it reduces thesize of the knowledge base.

ACKNOWLEDGEMENTS

This work was supported by an Australian Postgraduate Research Award and an Australian Electrical Supply Industry Research Board grant. Theauthors wish to thank Associate Professor TeoCheng-Yu, of the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, for use of the distribution system data.

REFERENCES

M. R. Genesereth and N. J. Nilsson (1987).Logical Foundations of Artificial Intelligence.Morgan Kaufman, 1987.

T. M. Mitchell (1982). Generalization as search.Artificial Intelligence, 18, 203-226, 1982.

J. R. Quinlan (1986). Induction of decision trees.Machine Learning, 1, 81-106, 1986.

C. Y. Teo and T. W. Chan (1990). Development of computer-aided assessment for distribution protection. Power Engineering Journal, i,21-28, January 1990.

J. Ypsilantis, H. Yee, and C-Y Teo (1991). Anadaptive, rule-based fault diagnostician for powerdistribution networks. March 1991. Submitted tolEE Proceedings Part C.

machine learning using version spaces for a power distribution network fault diagnostician

Documents