Fault Diagnosis of Single-Variate Systems Using a Wavelet-Based Pattern Recognition Technique

Download Fault Diagnosis of Single-Variate Systems Using a Wavelet-Based Pattern Recognition Technique

Post on 12-Feb-2017




0 download

Embed Size (px)


<ul><li><p>Fault Diagnosis of Single-Variate Systems Using a Wavelet-BasedPattern Recognition Technique</p><p>Fardin Akbaryan and P. R. Bishnoi*</p><p>Department of Chemical and Petroleum Engineering, The University of Calgary, 2500 University Drive NW,Calgary, Alberta T2N 1N4, Canada</p><p>A pattern recognition-based methodology is presented for fault diagnosis of a single-variate anddynamic system. A group of wavelet coordinates discriminating the classes of events mostefficiently among other wavelet coordinates are determined according to the linear discriminantbasis (LDB) method and a principal component analysis (PCA) technique. The proposed featureextractor couples the LDB method with the double wavelet packet tree in order to determinethe best configuration of pattern windows causing the most discrimination among classes. Thelifting scheme-based wavelet filters are used so that the required computation time is reducedsignificantly without degrading the robustness of the method. To reduce the size of the featurespace, the wavelet coordinates are projected into a new low-dimensional space, by using a PCAtechnique, where minimum correlation exists among the new space variables. The tuning ofsome parameters, which affect the performance of the approach, is also discussed. The featureclassifier is a binary decision tree that employs a soft-thresholding scheme for recognition of anoisy input pattern. The performance of the proposed technique is examined by a classificationbenchmark problem, and the faults classification problems for the Tennessee Eastman process.It is observed that the proposed pattern recognition methodology succeeds satisfactorily to classifythe noisy input pattern into the known classes of events.</p><p>1. Introduction</p><p>A system operates in the faulty condition when itsbehavior deviates considerably from normal and pre-defined operating strategies. Equipment failure, sensordegradation, set-point change, and disturbances in theinput streams are the instances of faulty states for asystem. The first group of faults, known as deterministicfaults, is generated by a fixed magnitude cause and isusually damped by using a robust control strategy.Various magnitudes of a deterministic cause, even atdifferent operating points, produce faults with similartrends. The second group, known as stochastic faults,results from causes whose magnitude changes randomlywith time. Besides, the controlling scheme cannot driveback the system to a steady-state operating condition.A stochastic fault, even at the same initial operatingpoint, could have different patterns.</p><p>Fault diagnosis is an important part of the processsupervisory routines that determines the states of thesystem (faulty or normal) as well as the types of faults.The analytical model-based,1-5 causal analysis,6,7 andpattern recognition8-13 are the main groups of faultdiagnosis approaches. Chemical processes are oftencharacterized by nonlinear behavior, noisy inputs, andunknown parameters. Thus, a model describing thesystem behavior, either mathematically or qualitatively,will be quite complicated.1-3 However, computer-basedpattern recognition extracts a wealth of informationfrom the large amount of process data quite satisfacto-rily without concern about the nature of a system. Someof the fault diagnosis methods assume that the faultoccurrence drives the system to a new steady-statecondition. Then the system characteristics at two dif-</p><p>ferent operating points are used for diagnosis pur-poses.4,5,11,12 If the system happens to reach its initialsteady-state condition, these methods will not be usefulfor fault diagnoses. As another shortcoming, theseapproaches cannot deal with the stochastic faults be-cause the system cannot reach a steady-state point. Iftransient trends of system variables are used as pat-terns, the fault diagnosis method will be free fromconsidering the steady-state conditions.8-10,13 This im-plies that the diagnosis method is applicable equally forany type of fault and final system condition.</p><p>In the present work, we propose a supervised patternrecognition methodology for fault diagnosis of single-variate and dynamic systems. The patterns are tran-sient trends of a process variable resulting from adisturbance in the system. The technique assesses thesimilarity of new, unknown patterns with the prototypesof each class, and the most similar class is consideredas the source of faulty behavior. Because transienttrends contain valuable information scattered within thetime and frequency domain, the feature extractor mustbe efficient equally for these domains. A multiscalewavelet-based transform serves in this work as thefeature extractor. The Fourier transform (FT) suffersfrom an inability to extract information of the timedomain.14 Although the short-time FT (STFT) is ableto process temporal features, its performance is inferiorto the wavelet transform (WT) especially for short-livedsegments of a pattern.14 The linear discriminant basis(LDB) method15 is modified in this work and used asthe basis of the proposed feature extractor. In additionto the whole-line wavelet filters used in the originalLDB method, the lifting scheme (LS)-based waveletfilters14 are employed by the feature extractor. As themain advantage, the LS filters require less computationtime than the whole-line wavelet filters. The informa-</p><p>* To whom correspondence should be addressed. Tel: (403)220-6695. Fax: (403) 282-3945. E-mail: bishnoi@ucalgary.ca.</p><p>3612 Ind. Eng. Chem. Res. 2001, 40, 3612-3622</p><p>10.1021/ie000779l CCC: $20.00 2001 American Chemical SocietyPublished on Web 07/13/2001</p></li><li><p>tion content of a signal depends on the length of thedata sequence. A large window of data gives moreinformation about the general trend of a pattern,whereas small-size windows focus more on the localstructure of a pattern. The proposed feature extractoris able to choose the best set of nonoverlapping windowsadaptively so that the selected features for each classare maximally discriminated. This helps the featureclassifier define a more robust decision scheme. Theproposed feature classifier is based on the binarydecision tree (DT) approach implemented in manyclassification routines. A DT-based classifier is trainedeasily and needs a few a priori assumptions. Theproposed tree classifies the extracted features accordingto a soft-thresholding technique. The tree determinesthe a posteriori probabilities that extracted features maybelong to different classes.</p><p>For ease of understanding, the frameworks of multi-scale feature extraction, the wavelet-based transforms,and the induction technique for the DT classifier areintroduced. The proposed methodology for feature ex-traction and classification, given a set of noisy data, isthen described. To demonstrate the efficacy of theproposed algorithm, it is applied to simulated data.</p><p>2. Background on Pattern Recognition</p><p>Pattern recognition is an algorithm that determinesthe most appropriate class of event(s) for the givenunlabeled input pattern. To recognize an unknownpattern, two main steps are usually followed: (1) featureextraction and (2) feature classification.</p><p>2.1. Feature Extraction. The transient behavior ofa chemical process shows the effects of different phys-icochemical events such as process dynamics, sensornoise, faults, and external loads. These events, knownas features, can be observed over different time andfrequency ranges. Filtering is a conventional techniquefor extracting the features. A filtering technique, whichexplores the entire range of frequency and time domainsimultaneously, is more reliable for extracting thefeatures. Multi-resolution analysis (MRA) of a patternis considered as a reliable basis for filtering the patternin time and frequency domains.10,14,16 Linear time-frequency representation (TFR) of a pattern aims topresent a pattern y(t) in terms of a weighted summationof some basis functions:</p><p>where i(t) stands for the basis function, ci is theweighting factor, and N is the number of sample points.The MRA of a pattern combines the patterns TFR atdifferent sampling rates, i.e., resolution, so that finedetails and a general trend of the pattern with adesirable accuracy can be achieved.14</p><p>The WT14 is considered as a highly efficient approachfor the MRA. The WT tiles the time-frequency planeeffectively such that the main features of a pattern,located at various frequencies and times, are extractedwith minimum redundancy. The wavelet basis functionsare localized well in time and frequency domains.</p><p>2.1.1. Wavelet Packet. The wavelet packet trans-form (WPT) is a generalized version of WT that decom-poses even the high-frequency bands kept intact in theWT. Unlike the WT, the WPT decomposes the pattern</p><p>into more different frequency bands at each time scaleso that a set of overcomplete wavelet coefficients wouldbe generated.14 By WPT, any subspace regardless of itstype is decomposed into two coarser subspaces.</p><p>where m represents the scale and f denotes the fre-quency counter for subspaces at each scale. Eachsubspace is called a packet, and binary packet tree isan ensemble of all of these subspaces. Thus, a techniquemust be implemented to choose the best set of basisfunctions from the group of redundant subspaces. Theformulation for selecting the best packets, for classifica-tion purposes, is discussed below.</p><p>Best Basis Selection. Saito15 proposed the LDBmethodology that determines a set of best packets thatmaximize the discrimination among different classes ofdata. In this technique, the importance of each packetis measured quantitatively by a statistical distance-based criterion termed the discriminant informationfunction (DIF) D(p,q). The p and q are two nonnegativevectors with kpk ) kqk ) 1. The DIF could be modeledby the j-divergence criterion:</p><p>The LDB method uses the time-frequency energy mapsof classes in order to compute the function D. The time-frequency energy map of class c is a table of positivereal values, denoted by the indexes m, f, and k.</p><p>where Nc is the number of patterns for the cth class.The dm,f,k</p><p>i is the kth wavelet coefficient located in thefth packet of the mth scale. The coefficient is obtainedby transforming the ith pattern into a wavelet packettree. The DIF for each packet is defined by</p><p>NC is the number of classes. The details of computationsteps for the LDB method can be found in work bySaito.15 The LDB reduces the complexity of the clas-sification algorithm by retaining the most discriminantfeatures. When the information content is dispersedthroughout the entire time-frequency plane, retainingonly a selected group of features may be far from theoptimal solution. Englehart16 used the PCA method forseeking the best combination of all of the features in alower dimensional space.</p><p>2.1.2. Dynamic WPT. Bakshi and Stephanopoulos17proposed the time-varying wavelet packet analysis,which is utilized mainly for on-line compression ofnonstationary signals. The WT and WPT require aminimum number of data points to decompose the givensignal into the next coarser scale. The WPT based onHaar wavelet filters, for instance, needs two data pointsto construct a two-level packet tree. As the number ofsamples increases to four, two more packet trees would</p><p>y(t) ) i)1</p><p>N</p><p>cii(t) (1)</p><p>m,f ) m-1,2f x m-1,2f+1 m ) L, L - 1, ..., 0</p><p>f ) 0, 1, ..., 2m - 1 (2)</p><p>D(p,q) ) i)1</p><p>n</p><p>pi logpi</p><p>qi+ qi log</p><p>qi</p><p>pi(3)</p><p>c(m,f,k) ) i)1</p><p>Nc</p><p>(dm,f,ki )2/</p><p>i)1</p><p>Nc</p><p>|yi(c)|2 (4)</p><p>D({c(m,f,...)}c)1NC ) ) </p><p>k)1</p><p>2m</p><p>i)1</p><p>NC-1</p><p>j)i+1</p><p>NC</p><p>D(i(m,f,k),j(m,f,k))</p><p>(5)</p><p>Ind. Eng. Chem. Res., Vol. 40, No. 16, 2001 3613</p></li><li><p>be added to the ensemble of packet trees. When packettrees of similar depth are arranged in a row, a doublewavelet packet tree (DWPT) would be constructed. Thenodes of this tree are the single packet trees that areestablished online during the sample collection. Con-figuration of DWPT depends on the types of waveletfilters and the length of signal. The best packet selectionalgorithms could be applied to find not only the bestpackets of each single tree but also the best set of packettrees within the double trees.</p><p>2.2. Feature Classification. DT is a supervisedclassifier, with some desirable properties,18 that hasbeen employed in a broad range of classification tasks.The outputs of DT are as accurate as those of the otherclassification algorithms such as artificial neuralnetworks.19-20 A DT consists of a series of decision andterminal nodes. At each decision node a specified testis performed on a selected element (attribute) of theinput pattern. Depending on the test result, the patterndescends to another node until the pattern reaches aterminal node (leaf). A DT is constructed by recursivepartitioning of data space, represented by trainingpatterns, until stopping criteria are met at each of theterminal nodes. The binary trees are preferred tononbinary ones because the former is not biased in favorof attributes with many outcomes.21 Moreover, thebinary trees split the data space into more regions thanthe nonbinary trees do.</p><p>The classification error depends largely on the selec-tion of appropriate attributes for each decision node. Atest on an attribute that divides the data set nontriviallyis considered as a potential candidate for categorizingthe input instance. The incremental tree induction (ITI)model21 employs a form of Kolmogorov-Smirnov dis-tance (KSD) to score each test for partitioning a set ofexamples at every decision node. For the continuousattributes, the suggested KSD would be</p><p>The matrix z represents samples whose attribute A doesnot retain Ak as its value, whereas matrix z1 denotesthe samples whose attribute A only has values that aresmaller than Ak. If two tests have similar KSDs, theITI methods break the tie in favor of a test whoseassociated attribute is lexically or numerically lowerthan the other test. To find the best test for a decisionnode, the most informative value is determined first foreach attribute. This value is used as a criterion for the</p><p>test. The best test is the one whose criterion has themaximum score. For a continuous attribute with mdistinct values, given by the training set, the bestinformative value exists in one of m - 1 disjointintervals. The best interval to locate the tests thresholdwould be the one whose upper limit has the maximumscore. Although there are infinite choices for a thresholdwithin the interval, the midpoint of the interval isusually taken as the threshold.12</p><p>If the data are corrupted by noise or the size oftraining data is too small to represent the true discrimi-nant function, the DT overfits the training data. Prun-ing of DT, when it grows completely is often used foravoiding such problems. Utgoff et al.21 utilized theminimum description length (MDL)22 for pruning theundesired subtrees. If the MDL of a node as a leaf issmaller than that as a decision node, the node will bedenoted as a leaf because it needs fewer bits forencoding.</p><p>3. Proposed Methodology</p><p>We propose a pattern classifier that can be used fordynamic and single-variate fault diagnosis problems....</p></li></ul>


View more >