[ieee 2007 ieee workshop on machine learning for signal processing - thessaloniki, greece...

Geometrical Kernel Machine for Prediction and NoveltyDetection of Disruptive Events in TOKAMAK Machines

Barbara Cannas, Rita Delogu, Alessandra Fanni, Member, IEEE, Augusto Montisci, PiergiorgioSonato, and Maria Katiuscia Zedda

Abstract- This paper presents a so called GeometricalKernel Machine used to predict disruptive events in nuclearfusion reactors. Here, the prediction problem is modeled as atwo classes classification problem, and the predictor is built byusing a new constructive algorithm that allows us toautomatically determine both the number of neurons and thesynaptic weights of a Multilayer Perceptron network with asingle hidden layer. It has been demonstrated that theresulting network is able to classify any set of patterns definedin a real domain. The geometrical interpretation of thenetwork equations allows us both to develop the predictor andto manage the so called ageing of the kernel machine. In fact,using the same kernel machine, a novelty detection system hasbeen integrated in the predictor, increasing the overall systemperformance.

I. INTRODUCTION

THIE interest on nuclear fusion devices as powergenerators has been growing since the last decades.

Although none of the operating nuclear devices hasproduced energy, as they operate as experimental machines,it is fundamental to study and understand the process in thelight of future developments.One of the drawbacks of those experimental devices is the

occurrence of disruptive events inside the plasma, whichpose serious limits to both machine's lifetime and itseconomical feasibility. During a disruption the plasmalooses its confinement and the plasma-facing componentsare then subject to forces of several tons.

In order to avoid or to mitigate the disruptive events anumber of disruption prediction techniques have beendeveloped [1-3]. In many cases neural networks approachesare used and they seems to be the most suitable to predictthe event or, more precisely, to build an impendingdisruption warning indicator.

In this paper, a new kernel machine, based on

Manuscript received April 3, 2007. This work, supported in part by theEuratom Communities under the contract of Association betweenEURATOM/ENEA, was carried out within the framework of the EuropeanFusion Development Agreement. The views and opinions expressed herein donot necessarily reflect those ofthe European Commission.

B. Cannas, R.Delogu, A. Fanni, A.Montisci, and M. K. Zedda are with theDepartment of Electrical and Electronic Engineering, University of Cagliari,Piazza d'Armi, 09123 Cagliari, Italy. (A. Montisci will be the correspondingauthor: +39 070 675 5848; fax: +39 070 675 5900; e-mail:augusto.montisci ( diee.unica.it).

P. Sonato is with the Consorzio RFX, Associazione Euratom-ENEA sullaFusione, Corso Stati Uniti, 4, 1-35 127 Padova, Italy.

geometrical synthesis of MultiLayer Perceptron (MLP)neural networks [4], is applied on a disruption databasetaken from JET, the biggest experimental fusion reactor inthe world. The database, used to train and test the kernelmachine, contains several diagnostic signals, whichcharacterize a disruptive event (disruptive pulse) or a nondisruptive event (safe pulse). The diagnostic signals areselected in order to maximize the prediction capability ofthe system and to be as much as possible machineindependent.The aim of prediction is to identify an incoming

disruption well before the disruption event in order to givethe mitigation system enough time to operate. The problemis modeled as a classification problem: the kernel machineis trained to classify patterns coming from the database asdisruptive (i.e. containing disruption precursors) or nondisruptive.An important aspect to take into account is the ageing of

the kernel machines; once completely new patterns arepresented to them, their performance miserably degrade.Those new patterns can come from unexplored device'sregion, so a chance to overcome this drawback is to mapthose new regions with novelty detection techniques.In the present paper, the same geometrical interpretation ofthe neural network is used to perform the novelty detection.

The performance of the proposed approach are reportedin terms of False Alarms, i.e., wrong predictions on safepulses, and Missed Alarms, i.e., wrong predictions ondisruptive pulses.

II. STATE OF THE ART FOR DISRUPTION PREDICTION

In literature several papers described the operationallimits of a tokamak and the theoretical stability.Nevertheless none of them have led to the development of areliable predictive model of disruptions due to the largenumber of parameters involved. For this reason in the last15 years, there have been several studies for the predictionof disruptions using neural networks. In particular, somepapers are focused on predicting the disruption proximity byusing an artificial output.The first application of neural networks for tokamak real

time control is reported in [5]. The net is trained with adatabase representing the plasma position of the tokamakexperiment COMPASS. The result showed that the networkresponds in close agreement with the signals obtained from

1-4244-1566-7/07/$25.00 ©2007 IEEE. 413

the device.In [6] the authors proposed an approach based on a

Bayesian probability assessment of the disruptivephenomenon and the Bayesian probability is modeled by aMLP neural network. The model tries to investigate if a setof parameters is useful as incoming disruption indicator.An artificial neural network was used in [7] to estimate

some disruption boundary in the DIII-D tokamak. Theauthors combined signals from a large number of plasmadiagnostics, demonstrating that the network, so trained,provided a much more accurate prediction of the disruptionboundary than that provided by traditional Troyon limit.

In [8] the authors computed the remaining time-to-disruption to indicate the stability level of the discharge.The neural network is specified, trained and thenimplemented within the real-time plasma control system.

In [1] an on-line predictor of the time to disruptioninstalled on the Asdex Upgrade tokamak is presented. Theprediction system uses a neural network trained on eightplasma parameters and some of their time derivativesextracted from 99 disruptive discharges. The system wasimplemented and tested for real-time mitigation, showingsatisfactory prediction capabilities. However, the authorshighlighted the deterioration of the network performance onon-line tests, due to the slight difference between the real-time signal and the stored ones. Moreover, newexperiments, which belong to operational spaces differentfrom those used for training, are not well predicted in theon-line implementation, thus presenting the so-called'ageing' of the neural network.Some major disruptions have been investigated in [3].

The 'stability level', proposed in the paper, is an indicatorcalculated from nine plasma parameters by a MLP, and theoccurrence of a major disruption is predicted when thestability level decreases to a certain level, named the 'alarmlevel'.The authors in [2] combined multiple plasma diagnostic

signals to provide a composite impending disruptionwarning indicator. To take into account the disruptionprecursors appearing in different time instants for differentpulses, an off-line clustering procedure automatically selectsthe training set samples.

The work presented in [9] has been performed during theflat-top of JET pulses characterized by a single null plasma.The authors trained an MLP to forecast disruptive events atJET, up to 100 ms in advance.

In [10] two neural approaches (Self Organizing Mapsand Support Vector Machines) are used to determine thenovelty of the output of the neural disruption predictor. Thenovelty detector is used to assess the reliability of thenetwork output, i.e., samples having a low confidence haveto be discarded and used off line to update the disruptionpredictor.

In [11], a Support Vector Machine has been developed torealize a predictor and a novelty detector in a uniquesystem. This approach tries to overcome the drawback ofneural networks ageing analyzing the Support Vectordecision function. It is widely recognized that a neuralnetwork deteriorates its performance when completely newpatterns are presented to it. With this approach the authorsfound that the novelty detector justifies many of the missedalarms of the predictor as they are recognized to belong tounexplored regions ofthe operational space.

III. KERNEL MACHINES: A GEOMETRICAL APPROACH

In this paper a new constructive algorithm to designMultilayer Perceptron networks used as classifiers ispresented. The resulting networks are able to classify afinite set of patterns defined in a real domain. The proposedprocedure allows us to automatically determine both thenumber of neurons and the synaptic weights of networkswith a single hidden layer.

In particular, a geometrical approach for the learning isproposed, which is alternative with respect to the classicblack box approach. In fact, following the geometricalinterpretation, each element of the net has a well definedtask and the consequences of removing a part of it can beforecasted with precision. More precisely, each neuron inthe net behaves as a linear separator and the coefficients ofthe hyper plane that defines this separator are equal to theconnection weights converging on that neuron.

In the proposed approach, as in Support Vector Machines(SVMs), the patterns in the input space are projected in afeature space where they become linearly separable. Thesynthesis of the hidden layer consists of determining aminimal number of neurons, i.e., the dimension of thefeature space, and the connections weights. Thiscorresponds to find a set of hyper planes in the input spacehaving a prefixed margin from the training points.

In order to make that projection, step functions are usedas kernel functions. Consequently, the decision surfaces inthe input space are combinations of linear separators, andthis allows us to use Linear Programming procedures toperform the mapping of the input space, rather than of thefeature space.

Moreover, the maximum distance between decisionsurfaces and nearest patterns (margin) can be directly fixedin the input space, rather than in the feature space as inSVMs. This implies to keep under control thegeneralization capabilities of the neural classifier. Inparticular, if the sets to be classified are separable (also ifthey are not linearly separable), provided that the points inthe training set are representative of the whole set, thegeneralization capability of the machine can be evaluated asa function ofthe margin value fixed by the designer.

Let us consider a two classes classification problem. The

414

Fig. 1. Mapping procedure in the input space

extension to k-classes is straightforward [4]. Theconstructive algorithm, which find the set of hyper planesthat map the input space, is based on a theorem, whosedemonstration is reported in [4]. The algorithm consists oftwo phases. Firstly, the sets of points are enveloped bymeans of polyhedrons, then these polyhedrons are separatedby means of hyper planes.

If the polyhedrons do not intersect the two sets of pointsare linearly separable.

If they intersect themselves then the polyhedrons arerecursively separated by means of hyper planes such thateach one separates a group of points belonging to the sameclass from the residual set. In this way, each hyper planehas to separate two linearly separable sets of points. Therecursion ends when the polyhedrons of the two remainingsets of points do not intersect each other.

Note that, the facets of the polyhedrons tell us how tosubdivide the classes in groups that are linearly separablefrom the rest of points.As an example, let us consider a 2-dimensions input

space, as shown in Fig. 1. The enveloping polyhedronsintersect each other (chequered area in Fig. 1), hence theyare not linearly separable, and the previously cited recursiveprocedure has to be performed.The facets of the intersection identify two set of points,

which are enveloped by the grey polyhedrons in Fig. 1.These two sets can be linearly separated from the

remaining points by the hyper planes HI and H2. The sameprocedure is repeated for the points inside the intersection,obtaining the hyper plane H3.The separating process has divided the points into 5

groups. The normal of the separating hyper planes will bethen the axis of the reference system in the feature space (inthis case we have 3 separating hyper planes that generate a3D feature space, as shown in Fig. 2). More precisely, theaxis will be perpendicular to their correspondent separating

hyper plane, the coordinate calculated on the normalbecomes a coordinate in the feature space after beingprojected from the activation function. Due to the stepactivation function, the points in the input space correspondto vertices of a unitary hypercube in the feature space.

The group lying on the positive side of a hyper plane willcorrespond to a coordinate equal to 1 in the respective axis,0 otherwise. More precisely, referring to Fig. 2, if forexample we consider the group GI, it lies on the positiveside of hyper planes 1 and 2, and in the negative side of thehyper plane 3, then it will be projected onto the vertex(1,1,0) in the feature space, while the group G3 will beprojected onto the vertex (0,0,1).

Once all the groups are projected, the resulting separatinghyper plane in the feature space can be evaluated.The approach avoids the typical local minima problems

of Error Back Propagation (more in general, the issue oflocal minima is a drawback of whatever deterministicmethod, even of Levenberg-Marquard algorithm, forinstance) and assures convergence ofthe method.

In [4] the performances of the proposed algorithm havebeen tested on some benchmark problems, which addressthe main issues related to the classifiers. The results showedthat the proposed approach is faster than the comparedmethods in synthesizing the classifier, in particular whenthe training set has a great number of examples, this will beuseful once applied to fusion reactor databases as a largenumber of patterns have to be classified and predicted aswell. Furthermore, the yielded classifiers are smaller andthe robustness in the classification can be easily fixed apriori.

IV. DISRUPTION PREDICTION

The scientific community considers disruptions as one ofthe main obstacles to tokamak operation, and, even thoughtdisruptions' dynamics have been studied for a long time, acomplete analysis has not been developed yet.

All works reported in literature are mainly based onobservation of, more or less, large database and someimportant aspects came up as a beginning step to study the

.;

G

---tI -~~~~~~~~~~~~~~~~~~~~~~~N

I 0I / a4t _

G4v, M: 2 1

IfS - /I,~~~Q

Fig. 2. 2D input space mapping and projection in 3D feature space

415

phenomenon and to classify it.During an experimental campaign it is crucial to know

disruption's causes in order to manage the machineparameters to avoid disruption for the successiveexperiments.

In the present paper, a large database has been builtbased on hundreds of diagnostic signals available in the JETexperimental machine.At first, a set of diagnostic signals has been chosen to

develop an automatic classifier able to recognize disruptionfast and unambiguous. In particular, according to plasmaphysicists, nine diagnostic signals have been chosen todescribe the plasma regime during the discharge flat-top.The choice of the signals takes into account physicalconsiderations, and the availability of real-time data.Moreover, they represent global tokamak devicesparameters, hence they do not depend on specific machines.The appropriateness of the selected signals havesubsequently been validated with salience and sensitivityanalysis.The database consists of 172 disruptive pulses and 102

safe pulses divided in two sets: the training set, consistingof 86 disrupted pulses, and the test set consisting of 86disrupted and 102 safe pulses. The validation set has notbeen used here, because the geometrical approach does notneed any cross validation procedure.

Let us consider a disruptive pulse, a time instant tprec canbe considered, which discriminates between stable andunstable states of the plasma. Some disruption precursorsare expected to appear in the time window from tprec to thedisruption time. Thus, two consecutive phases can beidentified before and after tprec non disruptive phase anddisruptive phase. Unfortunately, tprec does not have aprefixed value, and the identification of the two differentphases is often a very difficult task. Presently, indexes ofthetransition from a phase to the other are not available.

Nevertheless, as the considered predictor is constituted bya supervised neural network, for each disruptive pulse wehave to distinguish between samples belonging to the nondisruptive phase and samples belonging to the disruptivephase, in order to associate them a different output of theneural network. In particular, the samples of a disruptivepulse, which belong to the non disruptive phase, will beassociated to a null value of the network output, as well asall the samples of non disruptive pulses, whereas thesamples of a disruptive pulse, which belong to thedisruptive phase, will be associated to a network outputequal to one. This is one of the main issues in the design ofa disruption prediction system, and, in particular, in thetraining set generation (when the predictor is an MLP, as inthe present case where the geometrical approach is used todevelop an MLP predictor).

Hence, predicting a disruption can be modeled as a two

classes classification problem.A disruption predictor is built wherein multiple plasma

diagnostic signals are combined to provide a compositeimpending disruption warning indicator.The realized predictive system structure consists of two

blocks mutually connected: a Self Organizing Map (SOM)and the Geometrical Kernel Machine (GKM). In order tobuild the training set for the GKM predictor, 86 SOMs havebeen constructed, one for each pulse in the training set.Each SOM is used to discriminate between non-disruptiveand disruptive phases, i.e., to label each sample of a pulseas 'safe' sample or 'disrupted' sample, containinginformation about the disruption proximity.The GKM is built by using the proposed geometrical

approach in order to classify safe and disrupted samplesrespectively labeled as 0 and 1. During the on-lineapplication, the GKM is fed with all the samples of a pulse,and, for each of them, it returns a label equal to 0 (thesample belongs to the non disruptive phase of a disruptivepulse, or to a safe pulse) or 1 (the sample belongs to thedisruptive phase of a disruptive pulse and the alarm istriggered).

The following considerations have to be taken intoaccount when designing a disruption prediction system [1-3]:- the prediction success rate has to be greater than those of

the existing alarm systems already available;- the prediction time has to sufficiently anticipate the

starting of the disruption, in order to allow themitigation and shut-down systems to safely intervene;

- the false alarm rate has to be limited;- at the same time, an as low as possible missed alarm rate

should be obtained;- the prediction system has to be able to forecast different

types of disruptions, characterized by differentoperational scenarios and dynamics;

- the prediction system has to be able to operate in real-time.Hence, the performance of the prediction system is

evaluated here in terms of Percentage of False Alarms(PFA), where PFA is defined as the ratio between thenumber of safe pulses predicted by the system as disruptivepulses, and the total number of safe pulses, in percent; andin Percentage of Missed Alarms (PMA), where PMA isdefined as the ratio between the number of disruptive pulsespredicted as safe pulses, and the number of disruptivepulses, in percent. Note that a disruption prediction isconsidered successful if the system is able to correctlypredict the disruption at least tpred seconds prior to thedisruption time, where tpred depends on the machinedynamics. Moreover, for disruptive pulses, the Percentageof Premature Alarms (PPA) is defined as the ratio betweenthe number of premature alarms and the number of

416

novels novel ' Oove

0 //

I / ,Ix

novel0 novel

novel 4novel

Fig. 3. Wrapping hyper-box and novel regions, 2D case.

disruptive pulses, in percent. Note that a disruptionprediction is premature if the system predicts the disruptiontoo much in advance. At JET the prediction is consideredpremature if the system triggers the alarm at least 1 secondprior to the disruption time. Finally, the Prediction SuccessRate (PSR) is defined as the success rate of the predictor incorrectly predicting both disruptive and safe pulses.As previously mentioned, the geometrical approach

generates an MLP neural network able to correctly classifytraining patterns coming from the machine. With respect toother kernel machines approaches, as SVM, it does not needany time consuming parameters tuning phase, or trial anderrors training phase, and due to its geometricalcharacteristics it avoids overfitting. The only parameter tobe set is the minimum distance between samples and hyperplanes.

V. NOVELTY DETECTION

Novelty Detection (ND) consists of identifying new orunknown data that a machine learning system is not awareof during the training phase. Thus, ND is one of thefundamental requirements of a good classification orprediction system. In fact, actual data may contain patternsbelonging to operational regions not explored when thelearning system was developed. This could be the case ofthe disruption predictor presented in this paper, where newplasma configurations might present features completelydifferent from those observed in the experiments selectedfor the training set. This 'novelty' can lead to incorrectbehavior of the GKM predictor.

In the last ten years novelty detection acquired anincreasing attention, and a number of techniques have beenproposed and investigated to address it. In [12-13] theauthors highlighted that it is not possible to a priori identify

a single best model, and the success of a novelty detectiontechnique mainly depends on the statistical properties ofdata handled.

Both statistic and neural clustering methods can be usedfor novelty detection tasks.

In the present paper, the aim of novelty detection isreached exploring unmapped regions of the hypercube inthe feature space. All not mapped vertexes are novel. Yet,can be considered novel all the patterns falling out thehyper-box surrounding all the entire training set in theinput space. That hyper-box is built with hyper planesparallel to the coordinate reference planes.As said above, during the training phase, the input set is

separated into groups that will be projected on the vertexesof an hypercube, following the position they have withrespect to the separating hyper planes.When a test set is presented to the network the same

projection is provided; if a part of the test falls into anunmapped vertex ofthe hypercube it is considered as novel.

Fig.2 shows unmapped regions and all free vertexes ofthe hypercube. These areas can be considered as novelbecause they belong to unexplored input regions.

For what concern the wrapping hyper-box, Fig. 3 showsthe details for a 2D case. All the areas falling outside this'cover' belong to unexplored regions, hence anythingprecise can be said about them.

VI. RESULTS

As can be noted from Table I, the GKM predictor is ableto correctly classify all the samples of the training set.Unfortunately, it shows not very good performance inpredicting the test set, especially for what concern falsealarms.When the GKM predictor is used as ND, as described in

the previous section, the performance is modified asfollows:

* 12 of the 28 missed alarms are novel;* 19 of the 28 disruptive pulses triggering a

premature alarm are novel;* 45 of the 89 pulses correctly predicted by the

predictor are labeled as novel by the ND.

Table II shows a comparison of performances obtained bythe GKM predictor and by the GMK-ND in terms ofnumber of missed alarms (MA), number of false alarms(FA), and number of premature alarms (PA).As can be noted, the number of MA, of FA, and of PA

considerably decreases. Unfortunately, as expected, thediscrimination capability of the system in the on-lineapplication decreases.

417

TABLE I - GKM PREDICTOR PERFORMANCE IN TERMS OF PERCENTAGE OFFALSE ALARMS (PFA), PERCENTAGE OF MISSED ALARMS (PMA),PERCENTAGE OF PREMATURE ALARMS (PPA), AND PREDICTION SUCCESSRATE (PSR).

Training Set Test SetPMA 0% 32%

0/69 28/86PFA 42%

43/102PPA 0% 32%

0/69 28/86PSR 100% 47%

69/69 89/188

TAB. II - GKM PREDICTOR AND GKM NOVELTY DETECTOR PERFORMANCESIN TERMS OF NUMBER OF FALSE ALARMS (FA), OF MISSED ALARMS (MA),AND OF PREMATURE ALARMS (PA).

GKM GKM-NDMA 28 16FA 43 10PA 28 7

VII. CONCLUSIONS

The unavoidable ageing of a neural prediction system isan important issue for experimental machines, where newstates of the plasma are explored. So, it is crucial to have asystem able to measure the reliability of the predictor outputand to automatically update it in the case of plasmaconfigurations not used during the training phase. Theproposed Novelty Detection technique appears promisingfor enhancing the Geometrical Kernel Machine predictorreliability.

In this paper a new kernel machine is proposed, whichintegrates prediction and novelty detection capabilities in aunique system.

Using the knowledge acquired during the training phaseof the predictor, the Geometrical Kernel Machine is able todetect the novelty of new pulses increasing the performanceof the entire system.

In particular, the GKM used as novelty detector is able tojustify many of the missed alarms and false alarms of thepredictor as they are recognized to belong to unexploredregions ofthe operational space.

ACKNOWLEDGMENT

The authors would like to thank Mike Johnson and DavidHowell for providing the manual classification of thedisruptions, and Tim Hender, Richard Buttery and SimonPinches for supporting the work, and for the usefuldiscussions.

REFERENCES

[1] G. Pautasso, et al., "On-line prediction and mitigation of disruption inASDEX Upgrade", Nuclear Fusion, vol. 42, no. 1, pp. 100-108, 2002.

[2] B. Cannas, A. Fanni, G. Sias, P. Sonato, M.K. Zedda, and JET EFDAcontributors, "Neural approaches to disruption prediction at JET", inProc. of31th EPS Conf on Plasma Phys., 28G, London, 2004, p-1. 167,.

[3] R. Yoshino, "Neural-net disruption predictor in JT-60U", NuclearFusion vol. 43, pp. 1771-1786, 2003.

[4] R. Delogu, A. Fanni, and A. Montisci, "Geometrical Synthesis of MLPNeural Networks," Neurocomputing, to be published.

[5] C. M. Bishop, P. S. Haynes, M. E. U. Smith, T. N. Todd, andD. L. Trotman, "Real-Time control of a tokamak plasma using neuralnetwork", Neural Computation, vol. 7, no. 1, pp. 206-217, 1995.

[6] J. Svensson, "Disruption prediction for JET optimized shear discharges"JET, internal report.

[7] D. Wroblewsky, G.L. Jahns, "Tokamak disruption alarm based on aneural network model of the high-f3 limit", Nuclear Fusion, vol. 37,no. 6, pp. 725-741, 1997.

[8] Th. Zehetbauer, et al, "Real-time disruption handling at ASDEXupgrade", Fusion Engineering and Design, Vol. 56-57, pp. 721-725,Oct. 2001.

[9] B. Canmas, A. Fanni, E. Marongiu and P. Sonato, "Disruptionsforecasting at JET using Neural Networks" Nuclear Fusion, vol. 44, pp.68-76, 2004.

[10] B. Cannas, A. Fanni, P. Sonato, M. K. Zedda, "Novelty Detection forOn-Line Disruption Prediction Systems" in Proc. of 32th EPS Conf onPlasma Phys., Tarragona, Spain, 29C P 5.05 8, 2005.

[11] B. Cannas, R. S. Delogu, A. Fanmi, P. Sonato, M. K. Zedda, "SupportVector Machines for disruption prediction and novelty detection at JET",in Proc. ofSOFT Conference, Warsaw, Poland, 2006.

[12] M. Markou, S. Singh, "Novelty Detection: a review- part 1: statisticalapproaches", SignalProcessing, vol. 83, pp 2481-2497, 2003.

[13] M. Markou, S. Singh, "Novelty Detection: a review- part 2: neuralnetwork based approaches", Signal Processing, vol. 83, pp. 2499-2521,2003.

418

[ieee 2007 ieee workshop on machine learning for signal processing - thessaloniki, greece...

Documents