plant-wide detection and diagnosis using correspondence analysis

ARTICLE IN PRESS

Plant-wide detection and diagnosis using correspondence analysis$

K.P. Detrojaa, R.D. Gudib,�, S.C. Patwardhanb

aInterdisciplinary Programme on Systems and Control Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, IndiabDepartment of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India

Abstract

This paper presents an approach based on the correspondence analysis (CA) for the task of fault detection and diagnosis. Unlike other

data-based monitoring tools, such as principal components analysis/dynamic PCA (PCA/DPCA), the CA algorithm has been shown to

use a different metric to represent the information content in the data matrix X. Decomposition of the information represented in the

metric is shown here to yield superior performance from the viewpoints of data compression, discrimination and classification, as well as

early detection and diagnosis of faults. Metrics similar to the contribution plots and threshold statistics that have been developed and

used for PCA are also proposed in this paper for detection and diagnosis using the CA algorithm. Further, using the benchmark

Tennessee Eastman problem as a case study, significant performance improvements are demonstrated in monitoring and diagnosis (in

terms of shorter detection delays, smaller false alarm rates, reduced missed detection rates and clearer diagnosis) using the CA algorithm

over those achievable using the PCA and DPCA algorithms.

Keywords: Correspondence analysis (CA); Principal components analysis (PCA); Dynamic PCA (DPCA); Fault detection and diagnosis (FDD)

1. Introduction

Early detection and diagnosis of occurrence of anabnormal event in an operating plant is very importantfor ensuring plant safety and maintaining product quality.Tremendous advancements in the area of advancedinstrumentation have made it possible to measure hun-dreds of variables every few seconds. These measurementsbring in useful signatures about the status of the plantoperation. A wide variety of techniques, for detectingfaults, have been proposed in the literature (Choudhury,Shah, Thornhill, & Shook, 2006; Thornhill & Horch, 2006;Xia & Howell, 2005). These techniques can be broadlyclassified into model-based methods and historical databased methods. While model based methods can be used todetect and isolate signals indicating abnormal operation,

for large scale systems such quantitative (or qualitative)cause-effect models may be difficult to develop from thefirst principles.Historical data based methods for fault detection

attempt to extract maximum information out of thearchived data and require minimal first-principles knowl-edge of the plant. Due to high dimensionality andcorrelated variables, multivariate statistical tools, whichtake correlation amongst variables into account, are bettersuited for this task. Dimensionality reduction is therefore avery important thrust of historical data-based monitoringmethods.Generally, the information content in a data matrix X

can be quantified in terms of a number of criteria ormetrics. The most commonly used metric to measure thisinformation content is the variance and the multivariateanalysis of the variance (MANOVA) usually yields awealth of knowledge from the information embedded inthe column (variable) space of the matrix X. Multivariatestatistical tools, such as principal components analysis(PCA), are based on the decomposition of the variancesand address issues related to correlation along the column

www.elsevier.com/locate/conengprac

dx.doi.org/10.1016/j.conengprac.2007.02.007

mailto:[email protected]

ARTICLE IN PRESS1469

or the row spaces (i.e., when X is transposed). PCAdetermines a lower dimensional representation of the data,in terms of capturing the data directions that canonicallyexplain the most variance. This is done via singular valuedecomposition (SVD) of a suitably scaled (mean centeredand variance scaled) data matrix (X) and retaining thoseprincipal components that have significant singular values.PCA achieves dimensionality reduction in the columnspace by considering the correlation amongst the variables.Further analysis of the data can then be carried out in thereduced dimensional space.

In the context of abnormal situation management,abnormal behavior needs to be detected on-line and thenpossible cause(s) for the abnormality need to be isolated.This latter diagnosis of faults can be considered just as (ifnot more) important for the initiation of quick remedialmeasures. After a fault is detected, it is of paramountimportance to be able to find the root cause of the faultquickly and with the least ambiguity. The challengesinvolved in this task are related to identifying thosevariables, which are most relevant to diagnosing the fault,in an on-line fashion using the operating data. In a smallplant, an experienced operator may be able to diagnose thefault based on a visual examination of the variables, and byrelying on plant/process knowledge acquired over time.However, it becomes challenging to perform the same taskwhen the number of variables is large, and the process ishighly interacting. Attempts have been made to automatethe task of fault detection and diagnosis (FDD) using theinformation generated in the lower dimensional space byPCA, while monitoring the process online. PCA has beenused for the FDD task in broadly two ways. A majorityof the reported applications have used PCA (Kresta,MacGregor, & Marlin, 1991; Nomikos & MacGregor,1995; Zhang & Dudzic, 2006) with normal operating datato first build the statistics characterizing the normaloperation. During online monitoring, the projected on-linedata is compared with these limits characterizing thenormal operation and any abnormality is highlighted. Afault is detected using either Q or T2 statistics andcontribution plots (Miller, Swanson, & Heckler, 1998) areused to help fault isolation. Contribution plots have beenused to isolate a group of variables, which make thegreatest contribution to the deviation of squared predictionerror (SPE), i.e., the Q statistic. On the other hand, if datafor fault signatures with faults of different magnitudes anddirections is available from the historical records, theability of a PCA based model to isolate faults can besignificantly limited and could be enhanced. A relativelyrecent work (Chiang, Russell, & Braatz, 2000) has alsoused all available data including normal and fault modes tobuild PCA models. In such a case, the online monitoring isessentially a classification task and discriminatory abilitiesof the algorithm are particularly important. Amongst thesetwo methods, the former is perhaps more practical as itdoes not require prior knowledge of all the fault modes.However, the latter approach is helpful towards clearer

diagnosis when information on some of the faults isknown.Thus, while representation in a lower dimension space as

is done in PCA facilitates detection; diagnosis additionallyrequires resolution of signatures of different faults. In thiscontext, discrimination between different operating modesof a plant is an additional desirable objective over and abovedimensionality reduction, particularly when the secondapproach of PCA monitoring described above is used.From this viewpoint, one of the shortcomings of PCA is thatit is mostly representation-oriented and has minimaldiscriminatory capabilities, owing to the nature of itsformulation. PCA uses variance as a metric of informationfor a data matrix and seeks to maximize the explainedvariance by the principal components. However, as shownin Chiang et al. (2000), there are other algorithms such asmultiple discriminant analysis that can better discriminatebetween the normal and abnormal operating regions in thedata, in addition to achieving dimensionality reduction. Thisdiscriminatory representation has been shown to yieldsmaller misclassification rates during on-line monitoring.An important aspect that also needs to be recognized is

that the variance need not be the best metric forrepresenting and capturing the cause and effect relation-ships inherently embedded in the data matrix X. Usually,such cause and effect relationships are dynamic and can bemore effectively analyzed by assessing the row (sample)versus column (variable) associations. This becomesdifficult using conventional PCA formulation, as it cannothandle data with serial (time) correlations. In dynamicPCA (DPCA) or in the multiple discriminant analysis(MDA) approach, such dynamic relationships requireexpanding of the column space to generate a static mapof the dynamic relationships. This latter strategy hasdrawbacks in terms of larger matrix and data sizesnecessary towards achieving the same level of statisticalsignificance (Chiang, Russell, & Braatz, 2000), and thiswould also result in increased computational intensity.In this paper it is proposed to address the above

problems related to the task of FDD using an approachthat is based on the correspondence analysis (CA). CA(Greenacre, 1984; Greenacre, 1993; Hardle & Simar, 2003)is a powerful multivariate statistical tool, which is based ongeneralized SVD (GSVD). CA has been shown to be a dualanalysis, as it simultaneously analyzes dependencies incolumn, row and the joint row-column space in a duallower dimensional space (i.e., it can accommodate the caseswhen the rows are serially correlated). Thus, dynamiccorrelation can be represented relatively easily withouthaving to expand the row or column spaces and deal withlarger data sizes. CA primarily uses a measure of the row-column association as a representative metric and decom-poses it to obtain directions in the lower dimension space,which discriminates as well as compresses information.Unlike its counterparts such as PCA, it represents thecause–effect relationships in terms of a chi-square (w2)value, that measures the row–column associations. Since,


decomposition of the w2 value takes joint row–columnassociation into account; it can be expected to performbetter than conventionally used variance decompositionbased methods, such as PCA.

In an earlier publication (Detroja, Gudi, Patwardhan, &Roy, 2006), the properties and discriminatory abilities ofthe basic CA algorithm were presented. The use of the CAalgorithm for the task of fault detection and isolation,when fault signature data for all hypothesized faults wereassumed to be available from historical records fordifferent fault magnitudes, was demonstrated. As theserequirements on the training data may be unrealistic inmany situations, in this paper, the CA algorithm is furtheranalyzed from an additional perspective that is related toonline monitoring using only normal operating data.Further, statistical metrics that facilitate diagnosis duringonline process monitoring are defined and the CAalgorithm is also shown to be superior to PCA and itsdynamic counterpart for this task. The dimensionalityreduction achieved using CA is shown to be more effectiveas it takes the joint row-column association into account.Also, due the special kind of scaling it employs, CA hasalso shown to be able to cluster and aggregate the datamore effectively (Ding, He, Zha, & Simon, 2002), thusenabling better discrimination between normal and faultregions and hence leading to smaller misclassification rates.

The objectives of this paper are: (i) to demonstrate theusefulness of CA for fault detection when only normaloperating data is available for training; (ii) to define newstatistics for CA which are equivalent to Q and T2 statisticsproposed for PCA; (iii) to propose contribution plots forthe newly defined CA based Q statistic; (iv) to evaluate andcompare the performances of PCA, DPCA and CA fordetecting faults in a realistic chemical process plant; andfinally (v) to validate the utility of the CA contributionplots for fault diagnosis in a realistic chemical processplant. For all these objectives, operating data from anestablished benchmark problem, viz. the Tennessee East-man problem (Downs & Vogel, 1993) is considered. Theproposed CA based statistics are shown to performsignificantly better than the existing PCA and DPCAstatistics when applied to the task of online monitoring anddiagnosis of the Tennessee Eastman process. The remain-der of the paper is organized as follows. First, the PCA andits dynamic counterpart DPCA are briefly presented tomotivate the discussion for the CA algorithm. Then, theCA algorithm is described followed by the proposedapproach to fault detection and diagnosis using CA basedstatistics and contribution plots. Finally, PCA, DPCA andCA are applied to the data collected from the TennesseeEastman process simulator. The paper is concluded with acomparative study of results.

2. Principal components analysis (PCA)

Any matrix Xm� n consisting of m-observations and n-variables, collected from an operating plant has a wealth of

information regarding the health of the plant. Beforeapplying PCA, the data matrix (X) is auto-scaled bysubtracting each variable by its sample mean and thendividing each variable by its standard deviation. X is thescaled data matrix and x̄ is a vector containing mean ofeach variable.

x̄ ¼ 1=m� �

XT1, (1)

X̄ ¼ X� 1x̄T� �

D�1, (2)

where, D is a diagonal matrix containing standarddeviation of each variable and 1 is a vector of appropriatedimension having all the elements equal to 1.PCA decomposes the variance in the data, based on

dependencies along the columns, to achieve dimensionalityreduction. PCA computes a set of new orthogonalprincipal directions, called loading vectors. Loadingvectors are obtained by solving an optimization probleminvolving maximization of variance explained in the datamatrix by each direction. For example, the first direction isobtained as a solution of the optimization problem in thespace of the first linear combination t1 ¼ Xp1 as

maxp1

tT1 t1� �

¼ pT1XTXp1. (3)

Such that pT1 p1 ¼ 1.It has been shown that the singular vector corresponding

to the largest singular value provided by the SVD of X, isthe solution to the above optimization problem. Because ofcorrelation amongst variables, only first k (substantiallysmaller than n) loading vectors may explain most of thevariance in the data. Thus, PCA decomposes the matrix X

as

X̄ ¼ t1pT1 þ � � � þ tkp

Tk þ E (4)

such that

PTP ¼ I, (5)

where, P ¼ [p1ypn] contains loading vectors. The matrixT ¼ [t1ytn] is called the scores matrix. The matrix E

contains the component of the variance of the matrix X,such as noise, which cannot be explained by TkP

Tk

(retaining only first k (k5n) columns), and is also knownas residual matrix.

2.1. Fault detection using PCA

The statistical model developed using PCA, from thenormal operating data, can be used for the purpose ofonline monitoring and fault detection. When employedonline, new scores are obtained by projecting the newmeasurements onto the loading vectors. Normal operationof the plant can be characterized by Hotelling’s T2 statistic(Eq. (6)), based on the first k loading vectors (principalcomponents) retained. The status of the plant is considerednormal if the value of T2 statistic stays within its controllimit.


T2 ¼ xTPK�1PTx, (6)

where, x is the new measurement vector and K is a diagonalmatrix containing first k eigen values of the covariancematrix of X.

The control limit (threshold) for the T2 statistic (T2a) can

be calculated from Eq. (7) (Ku, Storer, & Georgakis, 1995).A value of T2 statistic greater than the control limit (T2

a)indicates occurrence of a fault.

T2a ¼

m� 1ð Þk

m� kð ÞF a k;m� kð Þ, (7)

where, Fa(k,m�k) is the upper 100a% critical point of F-distribution with k and m�k degrees of freedom.

However, monitoring only T2 statistic is not sufficient, asit only detects variation in the direction of the first k PCs.Variation in the space corresponding to (n–k) PCs (havingsmallest associated singular values) can also be monitoredusing Q statistic (Jackson & Mudholkar, 1979). The valueof Q statistic and its control limit can be calculated asfollows:

Q ¼ I� PPT� �

x� �T

I� PPT� �

x� �

, (8)

Qa ¼ y1h0ca

ffiffiffiffiffiffiffi2y2p

y1þ 1þ

y2h0 h0 � 1ð Þ

y21

" # 1=h0ð Þ

, (9)

where, yi ¼Pn

j¼kþ1 mj

� �2i, h0 ¼ 1� 2y1y3=3y

22, ca is the

normal deviate corresponding to (1–a) percentile and mj isjth singular value. When a fault occurs that results inchange in covariance structure of the normal operatingdata, it gets reflected by a high Q value.

2.2. Fault diagnosis using PCA

Once a fault is detected, the next step is to find the originof the fault. This is particularly difficult when the numberof variables is large. Occurrence of a fault can cause manyvariables to deviate from its nominal value under theinfluence of a controller, when the plant is operated underclosed loop. The objective of any fault identificationtechnique is to determine which of these variables aremost relevant to the diagnosis of the fault. Contributionplots (Miller et al., 1998) are generally used for PCA basedprocess monitoring scheme to achieve fault diagnosis.Contribution plots quantify the contribution of eachprocess variable to the ‘out-of-control-status’ of the plant.The groups of variables, which show higher contribution tothe residuals, are isolated as most relevant to the fault thathas occurred.

Several other approaches that are based on multiplemodels have also been proposed in the literature (Himes,Storer, & Georgakis, 1994). One such approach involvesdeveloping a separate PCA model for every fault situation.Thus, monitoring is performed based on the statistics fornormal and different fault mode of operation of the plant.When a fault (say, fault i) occurs, statistic for all but ith

fault model would go out of control limit. This approachhowever, requires data for all possible fault modes ofoperations.

2.3. Dynamic PCA (DPCA)

Monitoring using PCA statistics implicitly assumes thatthe measurements at any time instant are statisticallyindependent of the measurements at the past timeinstances. The assumption is generally not valid for mostprocesses due to dynamics of the plant. The presence ofauto and cross correlations results in a row-columnassociation which needs to be modeled to describe thevariations completely. Towards this end, the PCA methodcan be extended to take into account these correlations, byaugmenting each observation vector with a few pastobservations and stacking the data into a bigger matrix.

XA ¼ X tð Þ X t� 1ð Þ . . . X t� lð Þ½ �. (10)

By performing PCA on the augmented data matrix (XA), amultivariate auto regressive (AR) model is extracteddirectly from the data (Ku et al., 1995). Expectedly, thismethod however, requires working with considerablylarger data matrices than the conventional PCA, to achievethe same level of statistical significance (Chiang et al.,2000). The T2 and Q statistics and their control limitsproposed for PCA can be generalized directly to DPCA.Fault diagnosis using DPCA, is done based on a model

bank constructed from the available historical data. ADPCA model is built for each fault case in addition to thenormal operating data. This task of building a model bankrequires availability of the data for all the possible faultscenarios. It is important to note here that the archiveddata do not necessarily ensure availability of all possiblefault scenarios.

3. Correspondence analysis

The aim of correspondence analysis is to develop simpleindices to highlight associations between the rows and thecolumns. Unlike PCA, which canonically decomposes thetotal variance in the matrix X, CA decomposes a measureof row-column association, typically formulated as thetotal w2 value, to capture the dependencies. Conventionally,in PCA, the data in the matrix X are scaled to unitvariance. A classification analysis of the data resultingfrom this linear scaling is however not effective when theinherent structure or association between the variables isnonlinear. Ding et al. (2002) have shown that nonlinearscaling of data leads to a space having desirable propertiessuch as self-aggregation. CA also employs similar non-linear scaling of the data and hence brings in the associatedadvantages. Thus, the lower dimensional space obtainedusing CA is expected to be more discrimination orientedand better suited for online process monitoring.The formulation for the CA algorithm can be presented

in terms of weighted Euclidean space as follows. In general,


the objective of CA would be to seek a lower dimension(say k) approximation of the matrix X in an appropriatespace S. In CA, the closeness of the k-dimensionalapproximation can be measured in terms of the proximityof S to the row or column clouds. In terms of the row andcolumn points, each row of X can be represented as a pointxi (i ¼ 1,2ym) in an n-dimensional space and willconstitute the row cloud. Likewise, one could also visualizea set of column points xj (j ¼ 1,2yn), which wouldconstitute the column cloud in an m-dimensional space.When one seeks to estimate a lower dimensional space(approximation) S that is closest to X in terms of itsproximity to the cloud of row and column points, onecould solve optimization problems that are formulated inseveral possible ways. One such optimization problem todetermine the space S could then be to minimize a weightedEuclidean distance defined as

d2 ¼ x� x̄ð ÞTD x� x̄ð Þ. (11)

The choice of the matrix D is very critical to the resultingsolution of the optimization problem. In CA, the weightingmatrix for the distances from the row (column) cloud ischosen to be a diagonal matrix containing the row(column) sums Dr(Dc) as defined below. For the datamatrix X, let the sum of all the elements be denoted as g

and further let the vectors r and c denote the vectors of rowsums and column sums of [(1/g)X], respectively and aregiven as (Eq. (12) and (13)).

r ¼ 1=g� �

X� �

1, (12)

c ¼ 1=g� �

X� �T

1, (13)

where 1 is a vector of all 1’s of appropriate dimension. Thematrix Dr is then defined as Dr ¼ diag(r) and similarly,Dc ¼ diag(c). Greenacre (1984) defined the overall varia-tion of a cloud of points by their total inertia. The inertiafor the row cloud and the column cloud is defined as

IN rowð Þ ¼ trace Dr R� 1cT� �

D�1c R� 1cT� �T� �

, (14)

IN columnð Þ ¼ trace Dc C� 1rT� �

D�1r C� 1rT� �T� �

, (15)

where, R ¼ Dr�1[(1/g)X] and C ¼ Dc

�1[(1/g)XT], and havealso shown that the total inertia follows a w2 distribution(Hardle & Simar, 2003). It has also been shown (Green-acre, 1984) that the solution to the problem of minimizingthe weighted distances in Eq. (11) can be given bydecomposition of the inertia associated with row (orcolumn) cloud, i.e., by the generalized SVD of the matrix[(1/g)X–rcT]. The generalized SVD of this matrix can bewritten as

1=g� �

X� rcT� �

¼ ADmBT, (16)

such that

ATD�1r A ¼ Im�m;

BTD�1c B ¼ In�n;(17)

where, Dm is a diagonal matrix of singular values. It isimportant here to note the differences in the constraints forCA (Eq. (17)) and that of PCA (Eq. (5)). Thus, Eq. (16)provides a decomposition of the inertia rather than thevariance of the data matrix X.The generalized SVD results of Eq. (16) can also be

realized via the regular SVD of an appropriatelyscaled matrix X, as explained below. The matrix P isdefined as

P ¼ D�1=2r 1=g� �

X� rcT� �

D�1=2c . (18)

Then, the regular SVD of the matrix P gives the requiredsingular vectors for the row cloud. As described earlier, thematrix X may be considered either as a set of row points ora set of column points, i.e., as a row cloud or a columncloud. While finding the principal axes for the columncloud, the above discussion holds by replacing X with XT.The problem of finding principal axis for the row cloud andthe column cloud are dual to each other and A and B definethe principal axes for the column cloud and the row cloudrespectively. Principal axes are equivalent to principalcomponents (PCs) of PCA and the matrix B can be calledloadings for the row cloud. Further details and interpreta-tions on the CA algorithm are provided in Detroja et al.(2006).

3.1. Singular values and inertia explained

The sum of the squared singular values gives the totalinertia of the cloud. The inertia explained by each principalaxis can then be computed by

IN ith axisð Þ ¼m2iPnj¼1m

2j

, (19)

where, mi is ith singular value. Similarly, cumulative inertiaexplained up to the ith principal axis is the sum of theinertias explained up to that principal axis. This gives ameasure of accuracy (or quality of representation) of thelower dimensional approximation. A matrix of rank k

(k5m,n) will have only k significant singular values andremaining singular values would be trivial. In practicehowever, a singular value that is exactly equal to zero israrely encountered and hence a judicious choice of thevalue of k needs to be made as explained in the next sub-section.

3.2. Number of PCs to be retained

Although several mathematical criteria do exist forselecting the number of principal axis, there is no generallyfixed criterion proposed to determine how many principalaxes should be retained. Cumulative inertia explainedcan be a good candidate for selecting the number ofPCs to be retained. A plot of variance explained versusthe number of principal axes (also known as SCREE plot)could also give a good choice. As many principal axes are

ARTICLE IN PRESS

Table 1

Analogy between PCA and CA

PCA CA

Pre-analysis Mean-centering and variance

scaling (Recommended in the

literature)

Make variables

dimensionally

homogeneous (0–100%)

Information

representation

Variance Inertia

Row� Samples Row profile

Columns� Variables Column profile

New

directions�Loading vectors Principal axes

New co-

ordinates�Scores Row co-ordinates

Decomposition

strategy

SVD Generalized SVD

�These terms can be used interchangeably.

1473

retained until adding another principal axis does notimprove the quality of representation significantly.In general, a major part of the inertia can be explainedby retaining only first k (k5m,n) principal axescorresponding to the largest singular values. The co-ordinates (scores) of the row profile points and columnprofile points for the new principal axis can be computedby projection on A and B (only first k columns areretained), respectively.

F ¼ D�1r ADm, (20)

G ¼ D�1c BDm. (21)

The matrix F gives the new row co-ordinates for the rowcloud, which is equivalent to the scores in PCA. Table 1draws an analogy between various terms used in PCA andCA literature. The statistical model thus obtained usingCA gives an inertial decomposition and has been shownhere to be better suited for process monitoring. In the nextsection, a CA based process-monitoring scheme is pro-posed that is equivalent to that of PCA based multivariateprocess monitoring.

Remark: Ding et al. (2002) proposed a nonlinear scalingof the standard PCA algorithm, which resulted inenhanced discrimination of the data over that obtainedusing the standard PCA. The scaling involves finding asimilarity matrix from the data matrix for either the rowor the column space. The similarity matrix is then scaledby a diagonal matrix D containing row sums. However,this approach was oriented towards discrimination andwas not intended to capture the information present inthe row-column associations. The latter has been thefocus of the CA algorithm and results in a different wayof scaling the (suitably transformed) data matrix.Exploiting this row-column association also results insuperior discrimination between different classes ofdata sets.

4. Proposed process monitoring scheme using CA

Correspondence analysis has been used to build statis-tical models for ecological problems, study of vegetationhabit of species, social networks, etc. Here, it is proposedto build the statistical model for the plant data pertainingto normal operation using CA. As discussed earlier, CAtakes joint row-column association into account whiledecomposing the inertia. CA has also been shown to givebetter aggregation and clustering (Detroja et al., 2006). CAalso scores over PCA, which assumes statistical indepen-dence of samples (rows), as well as DPCA, which requiresaugmentation of the data matrix.The CA algorithm requires all the elements of the data

matrix to be non-negative and all variables to bedimensionally homogeneous. These requirements can bevery easily satisfied if all the variables (measured andmanipulated) are considered as 0–100%. The manipulatedvariables are generally computed in terms of percentagesdepending on the operating ranges of the actuators. Forthe measured variables, ranges are normally fixed byoperating ranges, process constraints or physical limits. Ifthe variables are measured in some different form, thenpre-processing of the data is required where they can firstbe converted into percentage depending on its range. Astatistical model based on the CA algorithm can then bebuilt for normal plant operation.Once the statistical model is built from the normal

operation data, the next task is to define control limits,which can be used for the purpose of online statisticalprocess monitoring of the plant. Motivated by Q and T2

statistics used in PCA and DPCA, here similar statistics aredefined for CA.For online process monitoring, when a new measure-

ment arrives, it is projected onto the PCs to obtain the newrow scores (co-ordinates). The new measurement vector xis given by

x ¼ x1 x2 . . . xm½ �T. (22)

The row sum of this measurement vector, r is given by

r ¼Xm

i¼1xi, (23)

and the new row scores can be obtained as

f ¼1

rxTGD�1m

T. (24)

4.1. T2 statistic for CA

Hotelling’s T2 statistic effectively captures normaloperating region for the multivariate data in PCA. Forthe statistical models that are built using CA, a similarstatistic can be used to characterize the normal plantbehavior. The T2 value for CA model is defined as inEq. (25).

T2 ¼ fTD�2m f, (25)


where, Dm contains first k-largest singular values, whichwere retained.

In case of PCA, it can be shown that t (scoresvector) follows t-distribution. However, in case of CA,the scores f do not follow any standard distribution.To be able to characterize the normal operationcontrol limits, a simplifying assumption that f followst-distribution, is made here and the control limit (thresh-old) for the CA based T2 statistic (T2

a) can be calculatedfrom Eq. (7).

4.2. Q statistic for CA

As explained earlier, monitoring the plant using only T2

statistics is not adequate for fault detection, as it onlymonitors the variation along the principal axes, which wereretained in the statistical CA model. Any significantdeviation in the direction of n–k PCs (corresponding tosmallest singular values), is also indicative of a fault. Thisdeviation, i.e., the residual vector, can be calculated for thenew measurement x, as

Res ¼ Bf �1

rx� c

� �. (26)

The value of Q statistic for CA is defined as in Eq. (27).

Q ¼ ResTRes. (27)

The control limit for the Q statistic is chosen as 95%confidence limit from the normal operating residual values.

Correspondence analysis, along with the statisticsdefined here, can be very useful in fault detection. In thenext section, the usefulness and superiority of CA for faultdiagnosis is demonstrated via a comparison with theperformance of statistics based on CA, PCA and DPCA.

Fig. 1. A diagram of Tennessee

4.3. Fault diagnosis using CA

Identifying a group of variables, which are most relevantto the fault that has occurred, is also equally important.Conventionally, in PCA, this is done by contribution plots,as explained earlier. Similar to the contribution plots for Q

statistics in PCA, contribution plots can be obtained forCA based Q statistics which are defined in the abovesection. Contribution of each variable to residual vector isdefined as

Coni ¼Res2iPnj¼1Res2j

. (28)

Variables, which show greater contribution towardsresidual are considered closely associated with the fault.

5. Application to Tennessee Eastman challenge problem

The Tennessee Eastman process proposed by Downs andVogel (1993) has been a benchmark problem for plant-widecontrol strategy and fault detection (Russell et al., 2000).The test problem is based on an actual chemical processwhere only the components, kinetics and operatingconditions were modified for proprietary reasons. Fig. 1shows a diagram of the process. The gaseous reactants A,C, D, and E and the inert B are fed to the reactor where theliquid products G and H are formed. The reactions in thereactor are irreversible, exothermic, and approximatelyfirst-order with respect to the reactant concentrations. Thereactor product stream is cooled through a condenser andthen fed to a vapor–liquid separator. The vapor exiting theseparator is recycled to the reactor feed through thecompressor. A portion of the recycle stream is purged to

Eastman process simulator.


keep the inert and byproducts from accumulating in theprocess. The condensed components from the separatorStream (10) are pumped to the stripper. Stream 4 is used tostrip the remaining reactants in Stream 10, and is combinedwith the recycle stream. The products G and H exiting thebase of the stripper are sent to a downstream process,which is not included in this process. The process contains41 measured and 12 manipulated variables. Measurementsare taken every 3min.

The Tennessee Eastman simulation (Downs & Vogel,1993) allows 21 pre-programmed major process distur-bances as shown in Table 2. Sixteen of these faults arecharacterized as known and five others are unknown.Faults IDV 1–7 are associated with a step change in aprocess variable. Faults IDV 8–12 are associated with anincrease in the variability of some process variables. FaultIDV 13 is a slow drift in the reaction kinetics, and faultIDV 14, 15 and 21 are associated with sticking valves. Theplant-wide control structure recommended in Lyman andGeorgakis (1995) and was used by Russell et al. (2000) fortheir study of fault detection using PCA and DPCA, wasalso used here to generate the closed loop simulated

Table 2

Process faults for the Tennessee Eastman process simulator

Fault Description Type

IDV(1) A/C Feed ratio, B composition constant

(Stream 4)

Step

IDV(2) B Component, A/C ratio constant (Stream 4) Step

IDV(3) D Feed temperature (Stream 2) Step

IDV(4) Reactor cooling water (RCW) inlet

temperature

Step

IDV(5) Condenser cooling water (CCW) inlet

temperature

Step

IDV(6) A Feed loss (Stream 1) Step

IDV(7) C Header pressure loss–reduced availability

(Stream 4)

Step

IDV(8) A, B, C Feed component (Stream 4) Random

IDV(9) D Feed temperature (Stream 2) Random

IDV(10) C Feed temperature (Stream 4) Random

IDV(11) RCW Inlet temperature Random

IDV(12) CCW Inlet temperature Random

IDV(13) Reactor kinetics Slow drift

IDV(14) RCW valve Sticking

IDV(15) CCW valve Sticking

IDV(16) Unknown —

IDV(17) Unknown —

IDV(18) Unknown —

IDV(19) Unknown —

IDV(20) Unknown —

IDV(21) The valve for stream 4 was fixed at the steady

state position

Constant

position

Table 3

Comparison of extent of separation

PCA Scaled PCA CA

EoS 1.0015 1.0648 1.3216

process data for each fault. The data sets used for allsimulation cases in this manuscript were generated fromthe simulation codes downloaded from /http://brahms.scs.uiuc.eduS.The statistical models were built from the normal

operation data consisting of 500 samples. All manipulatedand measurement variables, except the agitation speed ofthe reactor stirrer, making it a total of 52 variables wereused. The data was sampled every 3min. Twenty-onetesting sets were generated using the pre-programmedfaults (IDV 1–21) along with the normal operation data.

5.1. Comparison of discriminative ability: scaled PCA

vs. CA

In this subsection, we present a comparison of dis-criminative abilities of scaled PCA (Ding et al., 2002) andCA. As mentioned earlier, the nonlinear scaling introducedby Ding et al. promotes self-aggregation and helps bringpoints belonging to the same class of operation closer toeach other, resulting in better separation of different classesof data. This aggregation property is also expected to resultin better discriminatory ability. In the following anexample is presented to demonstrate and compare perfor-mance of PCA, scaled PCA and CA towards this task ofdiscrimination of different classes of data. This exampledeals with discriminating rows (samples).However, an important point that needs to be kept in

mind is that since these methods use entirely differentscaling schemes, the results cannot be compared directly inabsolute terms. Another important requirement is that ofquantifying the extent of overlaps between clusters andthus discrimination obtained in the lower dimensionalspace. Here, this task is performed by calculating: (i)between class and (ii) within class variance matrices(similar to that of Sb and Sw matrices obtained for FDA(See Duda, Hart, & Stork, 2003)). Comparison of thesematrices was done based on the trace of the matrix, whichis representative of the total scatter in the dimensions (PCs)that are retained. These metrics of comparison cannot bedone directly as said earlier due to different scales. Thus, itis required to normalize this metric of quantifying overlapsby the largest singular value of respective matrix. Here,extent of separation (EoS) is defined to compare thediscriminatory ability of these methods.

EoS ¼trace Sbð Þ

max eig Sbð Þð Þ. (29)

As is evident from the name, a larger value of EoS, whichrepresents how distant are the different groups would meanpossible better discrimination between groups.To illustrate the above concept, the simulation data of

the benchmark Tennessee Eastman process is consideredhere. Here, three classes of operation are considered: (i)normal operation; (ii) fault IDV-1; and (iii) fault IDV-2.The task is to discriminate between these regions ofoperation so as to perform fault detection and isolation.

http://brahms.scs.uiuc.edu

http://brahms.scs.uiuc.edu

ARTICLE IN PRESS

Table 4

Detection delays (in minutes)

Fault PCA Q DPCA Q CA Q PCA T2 DPCA T2 CA T2

IDV(1) 9 15 6 21 18 33

IDV(2) 36 39 33 51 48 36

IDV(4) 9 3 3 — 453 —

IDV(5) 3 6 3 48 6 27

IDV(6) 3 3 3 30 33 24

IDV(7) 3 3 3 3 3 3

IDV(8) 60 63 60 69 69 87

IDV(10) 147 150 81 288 303 186

IDV(11) 33 21 33 912 585 567

IDV(12) 24 24 9 66 9 63

IDV(13) 111 120 123 147 135 147

IDV(14) 3 3 3 12 18 —

IDV(16) 591 588 30 936 597 108

IDV(17) 75 72 66 87 84 468

IDV(18) 252 252 225 279 279 288

IDV(19) — 246 441 — — —

IDV(20) 261 252 222 261 267 252

IDV(21) 855 858 780 1689 1566 1527

1476

The PCA, scaled PCA and CA algorithm were applied onthis data set and the EoS was calculated. A comparativeresult is presented here. As can be seen from Table 3, EoSvalues for scaled PCA are better than those for the regularPCA. The CA algorithm however, gives better EoS thanboth PCA and scaled PCA. It must be noted that the CAalgorithm gives better discrimination and hence can beexpected to result in lower misclassification rates. This isalso evident from Fig. 2(c), as all the clusters are wellseparated from each other with only two PCs. Thus CA hasvery good discriminatory ability as is also evident from thelower dimensional representations generated using PCA,scaled PCA and CA (Fig. 2(a)–(c)).

5.2. Number of PCs to be retained

The normal operation data was used to build statisticalmodel from PCA, DPCA and CA. Data compression is animportant aspect of multivariate statistical tools. Thenumber of PCs to be retained in PCA can be determinedvia several criteria such as cross validation or scree test.Fig. 3(a) shows the scree plot for PCA model. Earlier work(Russell et al., 2000) retained 11 PCs which explainedapproximately 55% of the variance in the data. Also forDPCA, Chiang et al. (2000) reported that 29 PCs arenecessary to model the cause-effect relationships.

The CA algorithm is applied on the dimensionalhomogeneous data set with all the variables presented in0–100%. This is done based the operating ranges for eachvariable. When the analysis was done using CA, it wasfound that when 12 PCs were retained, 95.78% of the totalinertia (see Fig. 3(b)) was effectively captured. It should benoted here that these values of variance explained by PCAand inertia explained by CA cannot be directly compared.Nevertheless, the representation given by CA would appearto be better, as through the modeling of the row-column

associations, it is better able to capture inter-relationshipsbetween variables and samples. As explained earlier, thestatistical model obtained via CA can better aggregate thenormal operating data because of the non-linear scalingemployed (Ding et al., 2002). This aggregation propertyhelps in better discrimination between different regions ofoperation, thus is expected to give smaller misclassificationrates and also quick detection of the out-of-control statusof the plant. These results are discussed in detail in thefollowing subsections.

5.3. Fault detection using CA based statistics

The objective of the fault detection technique is that itshould be: (i) relatively independent of the training set; (ii)sensitive to all the possible faults of the process; and (iii)prompt towards the detection of the fault. Since the faultalarms are inevitable, an out-of-control value of a statisticcan be the result of a fault or of a false alarm. In order todecrease the rate of false alarms, a fault can be indicatedonly when several consecutive values of a statistic haveexceeded the threshold. In this study, the fault is indicatedonly when six consecutive statistic values have exceeded thecontrol limit, and the detection delay is recorded as the firsttime instance in which the threshold was exceeded. Thiswas done exactly in accordance with what has beenreported by Russell et al. (2000) so that results can becompared. Russell et al. (2000) have reported that themissed detection rates for faults 3, 9, and 15 were fairlyhigh, because no observable change in the mean or thevariance could be detected by visually comparing the plotsof each associated observation variable. While PCA isknown to have limitations for detecting faults that does notinvolve mean changes, this issue needs to be furtheranalyzed in the context of CA. Therefore, these faults arenot considered when comparing the methods. In thefollowing subsection, the performances of the proposedCA based statistics are compared with (D)PCA in terms of:(i) detection delays, (ii) false alarm rates and (iii) misseddetection rates.

5.4. Efficiency of detection

(a)
Detection delaysThe detection delays (in min) for all 18 faults (excludingfault 3, 9 and 15), are tabulated in Table 4. Statistichaving minimum detection delay is shown bold faced.All faults could be detected by the statistics definedbased on CA. It can also be seen that the Q and T2
statistics based on CA performed significantly better ascompared to the statistics based on PCA in terms ofshorter detection delay. The DPCA based statistics areexpected to perform (and has performed) better thanthe PCA statistics. The proposed CA based statisticshas performed even better than the DPCA statistics.The CA based Q statistic presented here is relativelyfaster in detecting faults when compared with statistics

ARTICLE IN PRESS

0 5

10

0

20

-5

0

5

-2-1

01

2-3

-2

-1

0

1

2

3

4

PC 2

10

-10

PC

3

10

-10PC 2 -20 -10 PC 1

-5

x103

PC

3

x10-3

-7.9374 -7.9374-7.9374

-7.9374-7.9374

-7.9374

PC 1

x10-4

Normal operation

Fault IDV-1

Fault IDV-2

Normal operation

Fault IDV-1

Fault IDV-2

a

b

c

0.15

0.1

0.05

0

-0.05

-0.1

PC

2

-0.2 -0.1 0 0.1 0.2 0.3

PC 1

Normal operation

Fault IDV-1

Fault IDV-2

Fig. 2. Scores plot for Tennessee Eastman simulation data. (a) PCA scores plot; (b) scaled PCA scores plot; (c) CA scores plot.

1477

generated via PCA and DPCA. An important observa-tion needs to be made in relation to Fault 16 (IDV-16).As seen in Table 4, the detection delay for this fault was591 and 588min for the Q-PCA and Q-DPCA statistic,respectively. However, Q-CA proposed here is seen todetect the fault much more rapidly than the DPCA(30 v/s 588min, respectively) statistics. The reason whyPCA is not able to detect the fault quickly is evidentfrom Fig. 4(a), which shows the Q-PCA plot for thefault IDV-16. In case of Q-PCA statistic, the deviationfrom normal operation is not much after the introduc-

tion of the fault for a longer time period. However, ascan be seen from Fig. 4(b), Q-CA plots for the faultIDV-16 is easily able to differentiate between normaloperation and the fault mode soon after the faultoccurred. Detection delays are also seen to be reducedconsiderably for other fault cases as well using the CAalgorithm as compared with the PCA algorithm (seeTable 4).

(b)
False alarm ratesIt is very important to complement detection delayswith false alarm rates. Many times reduced detection

ARTICLE IN PRESS

0 5

0

2

4

6

8

14

Variance e

xpla

ined

0 5

0

5

Inert

ia e

xpla

ined

PC Number

Cumulative Inertia explained

95.19 95.78

PC Number

12

10

35

30

25

20

15

10

Cumulative Variance explained

56.6354.15

2010 15

201510

Fig. 3. (a) Variance explained by each PC computed via PCA. (b) Inertia

explained by each PC computed via CA.

0

0

0

0

1

1.5

2

3Q

-CA

120

100

80

60

Q-P

CA

40

20

200

Sampling Instance

400 600 800 1000

x10-5

2.5

0.5

200

Sampling instance

400 600 800 1000

Fig. 4. (a) PCA based residual plot for IDV-16. (b) CA based residual plot

for IDV-16.

Table 5

False alarm rates for the training and testing sets

Statistic Training set Testing set

PCA Q 0.004 0.016

PCA T2 0.002 0.014

DPCA Q 0.004 0.281

DPCA T2 0.002 0.006

CA Q 0 0.0003787

CA T2 0 0.0021

1478

delays might actually result in increased false alarmrates. The effectiveness of each statistic is thusdetermined by calculating the false alarm rate duringnormal operating conditions for the training andtesting sets. A large number of data sets belonging tothe normal operation were generated. The falsealarm rates are calculated for 99% confidence limit.The results obtained from this exercise are tabulated inTable 5. It can be seen that the false alarm ratesfor testing set are much higher for PCA and DPCAbased statistics. The proposed CA based statisticgenerated significantly less false alarms compared to(D)PCA based statistics. Amongst the proposed CAbased statistics, the CA based T2 statistic had a higherfalse alarm rate compared to the CA based Q statistic.Thus, the proposed CA based statistics performed

much better compared to the conventional (D)PCAbased statistics and are much more effective in faultdetection.


(c)

Tab

Miss

Fau

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

IDV

Table 7

Variables with higher contribution for each fault case using PCA and CA

based residual contributions

Fault PCA CA

IDV(1) 16, 20, 46 16, 13, 7, 20

IDV(2) 30, 24, 16, 40 40, 4, 16, 13, 20, 7

IDV(4) 51, 21, 9 51

IDV(5) 11, 9, 35, 22, 18 4, 21, 22, 11

IDV(6) 44, 1 44, 1

IDV(7) 4, 16, 36, 38, 31, 25 16, 7, 13, 21, 4

IDV(8) 3, 46, 29, 16, 20 46, 13, 7, 16, 20

Missed detection ratesMissed detection rate is another important criterion forcomparing different fault detection statistics. Misseddetection rates are computed by setting the threshold atthe tenth highest value (for every thousand samples) fornormal operating condition. These adjusted thresholdscorrespond to 99% confidence limit. As mentionedearlier, the missed detection rates for faults 3, 9 and 15are fairly high and hence, these faults are excluded fromthe comparison. The missed detection rates for all 21fault cases except faults 3, 9 and 15 are tabulated inTable 6. The minimum missed detection rate for eachfault is shown bold faced. As can be seen from Table 6,the proposed CA based statistics had much lowermissed detection rates for 12 out of 18 fault cases.

IDV(10) 19, 34, 35 19, 20, 18

IDV(11) 51, 9, 21 51

IDV(12) 11, 37, 22, 4 21, 4, 11, 22

IDV(13) 37, 32, 26, 42 51, 16, 20, 13, 7

IDV(14) 21, 51, 9, 2, 42 21, 51

IDV(16) 50, 3, 27 50, 19

IDV(17) 21, 2, 42 51, 21

IDV(18) 22, 11, 40 4, 11, 40, 22

IDV(19) 5, 27, 37, 46, 38 40, 51, 20, 4

IDV(20) 46, 39, 13, 23 8, 42, 11, 40

IDV(21) 39, 26, 5 13, 7, 42, 20

43

44

45

46

47

XM

V (

10

)

48

5.5. Fault diagnosis

Once a fault is detected while online process monitoring,isolating the variables that are most closely related to thefault is very important. Generally, the diagnostic featuresof the monitoring algorithms (for e.g., contribution plots inthe PCA algorithm) enable the detection of a number ofadditional candidate variables, which are (likely spur-iously) flagged as contributing to the fault. Usually, only asmaller subset of these candidate variables would actuallybe contributing to the fault. Thus the success of thediagnostic method would depend on (i) ensuring that thefaulty variable is included in the set of candidate variablesand (ii) ensuring that this set of candidate variables issmall.

Contribution plots have been used in conjunction withPCA for the purpose of fault diagnosis. In the proposedmonitoring scheme based on CA, equivalent contribution

le 6

ed detection rates for testing set

lts PCA Q DPCA Q CA Q PCA T2 DPCA T2 CA T2

-1 0.003 0.005 0.0015 0.008 0.006 0.0080

-2 0.014 0.015 0.0135 0.020 0.019 0.0120

-4 0.038 0 0 0.956 0.939 0.9406

-5 0.746 0.748 0 0.775 0.758 0.7505

-6 0 0 0 0.011 0.013 0.0065

-7 0 0 0 0.085 0.159 0

-8 0.024 0.025 0.0100 0.034 0.028 0.0261

-10 0.659 0.665 0.5030 0.666 0.580 0.6302

-11 0.356 0.193 0.5618 0.794 0.801 0.9713

-12 0.025 0.024 0.0090 0.029 0.010 0.0417

-13 0.045 0.049 0.0543 0.060 0.049 0.0648

-14 0 0 0 0.158 0.061 0.9733

-16 0.755 0.708 0.3641 0.834 0.783 0.8541

-17 0.108 0.053 0.2977 0.259 0.240 0.8848

-18 0.101 0.100 0.0246 0.113 0.111 0.0281

-19 0.873 0.753 0.9743 0.996 0.993 0.9708

-20 0.550 0.490 0.5045 0.701 0.644 0.7660

-21 0.570 0.558 0.9939 0.736 0.644 0.9884

plots have been defined based on Q statistics generatedfrom the CA model. Variables having contribution greaterthan a specified threshold, towards the residual, are

0

39

40

41

42

0 600

XM

V (

10

)

Sampling Instance

Sampling Instance

400200 600 800 1000

52

50

48

46

44

42

40

38

36

34

200 400 800 1000

Fig. 5. Effect of fault IDV 4 and IDV 11 on RCW flow (manipulated

variable).


considered to be significant towards fault diagnosis andwould belong to the set of likely fault variables.

For the Tennessee Eastman problem, contribution plotsbased fault diagnosis was carried out for all fault cases(IDV-1 to IDV-21), using the PCA as well as the CAalgorithms. As discussed earlier, when six consecutivevalues of a Q or T2 statistic go out of control, a fault isdetected. The diagnosis is then performed using contribu-tion plot. Here, the residual contribution to all thevariables for each of these six instances is first calculated.A mean of these residual contributions is taken and thevariables having contribution greater than the specifiedthreshold (of 5%) are considered most relevant to the faultthat has occurred. The monitoring and diagnosis resultsobtained for all fault cases using PCA and CA aretabulated in Table 7. In a majority of the cases, the faultvariable set generated by diagnosis using PCA and CA hada number of common variables and in most cases, thesevariables were indeed found to be associated with the fault.These variables, that showed higher contribution towardsresidual for both PCA as well as CA, are shown bold facedin the table. In a majority of the faults (7 out of 18 faults),from Table 7 it is evident that the CA algorithm generates asmaller number of candidate variables as fault whencompared with the PCA algorithm (in 6 of the cases

0

0

100

% c

ontr

ibution

0

0

% c

ontr

ibution

80

60

40

20

10

Variable number

20 30 40 50

100

60

80

40

20

10

Variable number

20 30 40 50

Fig. 6. Comparison of contribution plots for IDV 4 and IDV 11 based on

PCA residuals.

studied, both PCA and CA gave same number of candidatevariables).For four of the faults, there was no overlap in the

variables between the fault candidate sets generated byPCA and CA. Of these, IDV-13 and IDV-21 are furtheranalyzed here to compare the relative performance of PCAand CA. In case of IDV-13 (Reactor Kinetics), PCAhighlighted variables 37 (Component D, Stream 11), 32(Component D, stream 9), 26 (Component D, stream 6)and 42 (D feed flow) as likely candidates for the fault,whereas CA flagged variables 51 (Reactor cooling waterflow), 16 (Stripper pressure), 20 (Compressor work), 13(Product separator pressure) and 7 (Reactor pressure).When comparing these two sets in relation to the actual TEprocess via a cause-effect analysis, the variables flagged byCA were found to be more representative of the fault IDV-13. A similar interpretation of the results for IDV-21 alsoindicated that CA did indicate likely faults in a moreconsistent fashion than the PCA algorithm. Some addi-tional and interesting fault scenarios are discussed here indetail.

Case 1: IDV 4 and IDV 11

Faults IDV-4 and IDV-11 involve a disturbance inreactor cooling water (RCW) inlet temperature. The effect

0 30

0

40

60

100

% c

ontr

ibution

0

0

% c

ontr

ibution

Variable number

Variable number

80

20

100

80

60

40

20

10 20 30 40 50

50402010

Fig. 7. Comparison of contribution plots for IDV 4 and IDV 11 based on

CA residuals.

dx.doi.org/10.1016/j.conengprac.2006.10.011

plant-wide detection and diagnosis using correspondence analysis

Documents