benchmarking diagnostic algorithms on an electrical power ...€¦ · benchmarking diagnostic...

14
Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan 2 , Scott Poll 3 , David Garcia 4 , Stephanie Wright 5 1 Mission Critical Technologies @ NASA Ames Research Center, Moffett Field, CA, 94035, USA [email protected] 2 University of California, Santa Cruz @ NASA Ames Research Center, Moffett Field, CA, 94035, USA [email protected] 3 NASA Ames Research Center, Moffett Field, CA, 94035, USA [email protected] 4 Stinger Ghaffarian Technologies @ NASA Ames Research Center, Moffett Field, CA, 94035, USA [email protected] 5 Vanderbilt University, Nashville, TN, 37203, USA [email protected] ABSTRACT Diagnostic algorithms (DAs) are key to enabling automated health management. These algorithms are designed to detect and isolate anomalies of either a component or the whole system based on observations received from sensors. In recent years a wide range of algorithms, both model- based and data-driven, have been developed to increase autonomy and improve system reliabil- ity and affordability. However, the lack of sup- port to perform systematic benchmarking of these algorithms continues to create barriers for effec- tive development and deployment of diagnostic technologies. In this paper, we present our efforts to benchmark a set of DAs on a common plat- form using a framework that was developed to evaluate and compare various performance met- rics for diagnostic technologies. The diagnosed system is an electrical power system, namely the Advanced Diagnostics and Prognostics Testbed (ADAPT) developed and located at the NASA Ames Research Center. The paper presents the fundamentals of the benchmarking framework, the ADAPT system, description of faults and data sets, the metrics used for evaluation, and an in-depth analysis of benchmarking results ob- tained from testing ten diagnostic algorithms on the ADAPT electrical power system testbed. 1 INTRODUCTION Fault Diagnosis in physical systems involves the de- tection of anomalous system behavior and the identi- fication of its cause. Key steps in the diagnostic in- ference are fault detection (is the output of the system This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States Li- cense, which permits unrestricted use, distribution, and re- production in any medium, provided the original author and source are credited. incorrect?), fault isolation (what is broken in the sys- tem?), fault identification (what is the magnitude of the failure?), and fault recovery (how can the system continue to operate in the presence of the faults?). Ex- pert knowledge and prior know-how about the system, models describing the behavior of the system, and sen- sor data from the system during actual operation are used to develop diagnostic inference algorithms. This problem is non-trivial for a variety of reasons includ- ing: Incorrect and/or insufficient knowledge about system behavior Limited observability Presence of many different types of faults (system/supervisor/actuator/sensor faults, addi- tive/multiplicative faults, abrupt/incipient faults, persistent/intermittent faults) Non-local and delayed effect of faults due to dy- namic nature of system behavior Presence of other phenomena that influence/mask the symptoms of faults (unknown inputs acting on system, noise that affects the output of sensors, etc.) Several communities have attempted to solve the diagnostic inference problem using various methods. Some typical approaches have been: Expert Systems - These approaches encode knowledge about system behavior into a form that can be used for inference. Some examples are rule-based systems (Kostelezky et al., 1990) and fault trees (Kavcic and Juricic, 1997). Model-based Systems - These approaches use an explicit model of the system configuration and behavior to guide the diagnostic inference. Some examples are “FDI” methods (Gertler and Inc., https://ntrs.nasa.gov/search.jsp?R=20110011109 2020-06-22T17:43:57+00:00Z

Upload: others

Post on 14-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Benchmarking Diagnostic Algorithms on an Electrical PowerSystem Testbed

Tolga Kurtoglu 1 , Sriram Narasimhan 2 , Scott Poll 3 , David Garcia 4, Stephanie Wright 5

1 Mission Critical Technologies @ NASA Ames Research Center, Moffett Field, CA, 94035, [email protected]

2 University of California, Santa Cruz @ NASA Ames Research Center, Moffett Field, CA, 94035, [email protected]

3 NASA Ames Research Center, Moffett Field, CA, 94035, [email protected]

4 Stinger Ghaffarian Technologies @ NASA Ames Research Center, Moffett Field, CA, 94035, [email protected]

5 Vanderbilt University, Nashville, TN, 37203, [email protected]

ABSTRACT

Diagnostic algorithms (DAs) are key to enablingautomated health management. These algorithmsare designed to detect and isolate anomalies ofeither a component or the whole system basedon observations received from sensors. In recentyears a wide range of algorithms, both model-based and data-driven, have been developed toincrease autonomy and improve system reliabil-ity and affordability. However, the lack of sup-port to perform systematic benchmarking of thesealgorithms continues to create barriers for effec-tive development and deployment of diagnostictechnologies. In this paper, we present our effortsto benchmark a set of DAs on a common plat-form using a framework that was developed toevaluate and compare various performance met-rics for diagnostic technologies. The diagnosedsystem is an electrical power system, namely theAdvanced Diagnostics and Prognostics Testbed(ADAPT) developed and located at the NASAAmes Research Center. The paper presents thefundamentals of the benchmarking framework,the ADAPT system, description of faults anddata sets, the metrics used for evaluation, andan in-depth analysis of benchmarking results ob-tained from testing ten diagnostic algorithms onthe ADAPT electrical power system testbed.

1 INTRODUCTION

Fault Diagnosis in physical systems involves the de-tection of anomalous system behavior and the identi-fication of its cause. Key steps in the diagnostic in-ference are fault detection (is the output of the system

This is an open-access article distributed under the termsof the Creative Commons Attribution 3.0 United States Li-cense, which permits unrestricted use, distribution, and re-production in any medium, provided the original author andsource are credited.

incorrect?), fault isolation (what is broken in the sys-tem?), fault identification (what is the magnitude ofthe failure?), and fault recovery (how can the systemcontinue to operate in the presence of the faults?). Ex-pert knowledge and prior know-how about the system,models describing the behavior of the system, and sen-sor data from the system during actual operation areused to develop diagnostic inference algorithms. Thisproblem is non-trivial for a variety of reasons includ-ing:

• Incorrect and/or insufficient knowledge aboutsystem behavior

• Limited observability

• Presence of many different types of faults(system/supervisor/actuator/sensor faults, addi-tive/multiplicative faults, abrupt/incipient faults,persistent/intermittent faults)

• Non-local and delayed effect of faults due to dy-namic nature of system behavior

• Presence of other phenomena that influence/maskthe symptoms of faults (unknown inputs acting onsystem, noise that affects the output of sensors,etc.)

Several communities have attempted to solve thediagnostic inference problem using various methods.Some typical approaches have been:

• Expert Systems - These approaches encodeknowledge about system behavior into a form thatcan be used for inference. Some examples arerule-based systems (Kostelezky et al., 1990) andfault trees (Kavcic and Juricic, 1997).

• Model-based Systems - These approaches use anexplicit model of the system configuration andbehavior to guide the diagnostic inference. Someexamples are “FDI” methods (Gertler and Inc.,

https://ntrs.nasa.gov/search.jsp?R=20110011109 2020-06-22T17:43:57+00:00Z

Page 2: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of the Prognostics and Health Management Society, 2009

1998), statistical methods (Basseville and Niki-forov, 1993), “AI” methods (Hamscher et al.,1992).

• Data-driven Systems - These approaches use onlythe data from representative runs to learn parame-ters that can then be used for anomaly detection ordiagnostic inference for future runs. Some exam-ples are IMS (Iverson, 2004), Neural Networks(Sorsa and Koivo, 1998), etc.

• Stochastic Method - These approaches treat thediagnosis problem as a belief state estimationproblem. Some examples are Bayesian Networks(Lerner et al., 2000), Particle Filters (de Freitas,2001), etc.

Despite the development of such a variety of nota-tions, techniques, and algorithms, efforts to evaluateand compare the different diagnostic algorithms (DA)have been minimal (discussed in Section 2). One of themajor deterrents is the lack of a common frameworkfor evaluating and comparing diagnostic algorithms.Such a framework would consist of the following:

• Define a standard representation format for thesystem description, sensor data, and diagnosis re-sults

• Develop a software run-time architecture that canrun specific scenarios from actual system, simu-lation, or other data sources such as files (indi-vidually or as a batch), execute DAs, send sce-nario data to the DA at appropriate time steps, andarchive the diagnostic results from the DA

• Define a set of metrics to be computed based onthe comparison of the actual scenario and diagno-sis results from the DA

Some initial steps in developing such a frameworkare presented in (Kurtoglu et al., 2009b) as part of theDX Competition initiative (Kurtoglu et al., 2009a). Inthis paper, we present our efforts to benchmark a setof DAs on the NASA Ames Electrical Power Systemtestbed (ADAPT) using the DXC framework. Sec-tion 2 describes other work related to benchmarkingof DAs. Section 3 presents the DXC framework inbrief. Section 4 describes how the benchmarking wasperformed including a description of the ADAPT sys-tem, the faults injected, the DAs tested and the metricscomputed. Section 5 presents the results and detailedanalyses of the benchmarking activity. Section 6 liststhe limitations and plans for future work. Lastly, sec-tion 7 presents the conclusions.

2 RELATED WORK

Several researchers have attempted to demonstratebenchmarking capability on different systems. Amongthese, (Orsagh et al., 2002) provided a set of 14 met-rics to measure the performance and effectiveness ofprognostics and health management algorithms for USNavy applications (Roemer et al., 2005). (Bartys et

al., 2006) presented a benchmarking study for actua-tor fault detection and identification (FDI). This study,developed by the DAMADICS Research Training Net-work, introduced a set of 18 performance indices usedfor benchmarking of FDI algorithms on an industrialvalve-actuator system. Izadi-Zamanabadi and Blanke(1999) presented a ship propulsion system as a bench-mark for autonomous fault control. This benchmarkhas two main elements. One is the development of anFDI algorithm, and the other is the analysis and imple-mentation of autonomous fault accommodation. Fi-nally, (Simon et al., 2008) introduced a benchmarkingtechnique for gas path diagnosis methods to assess theperformance of engine health management technolo-gies.

The approach presented in this paper uses the DXCFramework (Kurtoglu et al., 2009b) (described in Sec-tion 3) which adopts some of its metrics from theliterature (Society of Automotive Engineers, 2007;Orsagh et al., 2002; Simon et al., 2008) and extendsprior work in this area by 1) defining a number ofnew benchmarking indices, 2) providing a generic,application-independent architecture that can be usedfor benchmarking different monitoring and diagnos-tic algorithms, and 3) facilitating the use of real pro-cess data on large-scale, complex engineering systems.Moreover, it is not restricted to a single fault assump-tion and enables the calculation of benchmarking met-rics for systems in which each fault scenario may con-tain multiple faults.

3 FRAMEWORK OVERVIEW

The framework architecture employed for benchmark-ing diagnostic algorithms is shown in Figure 1. Ma-jor elements are the physical system (ADAPT EPStestbed), diagnostic algorithms, scenario-based exper-iments, and benchmarking software. The physical sys-tem description and sample data (nominal and faulty)are provided to algorithm and model developers tobuild DAs. System documentation in XML formatspecifies the components, connections, and high-levelmode behavior descriptions, including failure modes.A diagram with component labels and connection in-formation is also provided. The documentation de-fines component and mode identifiers DAs must re-port in their diagnoses for proper benchmarking. Thefault catalog, part of the system documentation, estab-lishes the failure modes that may be injected into ex-perimental test scenarios and diagnosed by the DAs.Benchmarking software is used to quantitatively eval-uate the DA output against the known fault injectionsusing predefined metrics.

Execution and evaluation of diagnostic algorithmsare accomplished with the run-time architecture de-picted in Figure 2. The architecture has been designedto interface DAs with little overhead by using minimal-istic C++ and Java APIs or by implementing a sim-ple ASCII-based TCP messaging protocol. The Sce-nario Loader is the main entry point for DAs; it starts

2

Page 3: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of thePrognostics and Health Management Society, 2009

FaultInjection

Test Y

Scenarios

Fault Catalog Physical System under Evaluation

' Fault ADAPT^ Data ^nLSystem Performance ^aaSAPTstem

^ML

Description Metrics Description

dDiagnostic Algorithm Benchmarking Softwaren agontlim

Scorecardq Algorithm Cq Algorithm B

AlgorithmA DiagnosticData

-T..pomlMehica- COmpulatlonel Metrim Metric1....

c t - Technical Metrics tea........

Figure 1: Benchmarking Framework Architecture

and stops other processes and cleans up upon comple-tion of all scenarios. The Scenario Data Source pub-lishes archived datasets containing commands and sen-sor values following a wall-clock schedule specifiedby timestamps in the scenario files. The DA uses thediagnostic framework messaging interface to receivesensor and command data, perform a diagnosis, andpublish the results; it is the only component that is im-plemented by DA developers. The Scenario Recordertimestamps each fault injection and diagnosis messageupon arrival and compiles it into a Scenario Resultsfile. The Evaluator takes the Scenario Results File andcalculates metrics to evaluate DA performance.

Messages in the run-time architecture are exchangedas ASCII text over TCP/IP. DAs may use provided APIcalls for parsing, sending, and receiving messages orchoose to use the underlying TCP/IP. The DA outputis standardized to facilitate benchmarking and includesa timestamp indicating when the diagnosis has been is-sued; a detection signal that indicates whether the DAhas detected a fault; an isolation signal that indicateswhether a DA has isolated a candidate or set of can-didates with associated probabilities; and a candidatefault set that has one, or multiple candidates – eachwith a single or multiple faults. More details about theframework can be found in (Kurtoglu et al., 2009b).

Scenario loads Data Source,

Loader Agorithm, Recorde

Scenario Diagnostic

Data Source Algorithm

sendscosmnso^values,ds

sem8sensorvalues, Scenario sendsd i gnos is —•

nds, fault data Recorder

Scenario ^Processed^ Evaluator

Results °y

Figure 2: Run-time Architecture

4 BENCHMARKING ON ADAPT EPSTESTBED

4.1 ADAPT EPS

System DescriptionThe physical system used for benchmarking is theElectrical Power System testbed in the ADAPT lab atNASA Ames Research Center (Poll et al., 2007). TheADAPT EPS testbed provides a means for evaluat-ing diagnostic algorithms through the controlled inser-tion of faults in repeatable failure scenarios. The EPStestbed incorporates low-cost commercial off-the-shelf(COTS) components connected in a system topologythat provides the functions typical of aerospace vehi-cle electrical power systems: energy conversion /gen-eration (battery chargers), energy storage (three setsof lead-acid batteries), power distribution (two invert-ers, several relays, circuit breakers, and loads) andpower management (command, control, and data ac-quisition). The EPS delivers AC (Alternating Cur-rent) and DC (Direct Current) power to loads, whichin an aerospace vehicle could include subsystems suchas the avionics, propulsion, life support, environmen-tal controls, and science payloads. A data acquisitionand control system commands the testbed into differ-ent configurations and records data from sensors thatmeasure system variables such as voltages, currents,temperatures, and switch positions. Data is presentlyacquired at a 2 Hz rate.

The scope of the ADAPT EPS testbed used in thebenchmarking study is shown Figure 3. Power stor-age and distribution elements from the batteries to theloads are within scope; there are no power generationelements. In order to encourage more DA developersto participate in the benchmarking effort, we also in-cluded a simplified scope of the system called ADAPT-Lite, which included a single battery to a single load asindicated by the dashed lines in the schematic (Figure3). The characteristics of ADAPT-Lite and ADAPTare summarized in Table 1. The greatest simplifica-tion of ADAPT-Lite relative to ADAPT is not the re-duced size of the domain but the elimination of nom-inal mode transitions. The starting configuration forADAPT-Lite data has all relays and circuit breakersclosed and no nominal mode changes are commandedduring the scenarios. Hence, any noticeable changesin sensor values may be correctly attributed to faultsinjected into the scenarios. By contrast, the initialconfiguration for ADAPT data has all relays open andnominal mode changes are commanded during the sce-narios. The commanded configuration changes re-sult in adjustments to sensor values as well as tran-sients which are nominal and not indicative of injectedfaults. Finally, ADAPT-Lite is restricted to singlefaults whereas multiple faults are allowed in ADAPT.

Fault Injection and ScenariosADAPT supports the repeatable injection of faults intothe system in three ways:

Page 4: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of thePrognostics and HealthManagement Society, 2009

Load Bank 1 LT500

Battery Cabinet TETE500

133 ESHLGT400

apt 170 TE

LGT401 501TE LGT402L1A128 TE

^-{ 162 INV 166EY1A STFAN415BAT1

CB1s6 EY160CB162 1 CB166515 L1B

ST

120V AC>>

EY171L1C

MM EY144

EY1LGT481L1D

L1E

TELGT411 L1F511EY175

m I 180 183

L1G

BAT2 ^ i IT181 EY18sCB2s6 E242 184

EY241

IT240 r--- 1 I 24V DC >> L1HESH

m244AEY184

EY244 I

Load Bank 2

ESHI I

270ADAPT-LiteFT

I

PMP420 L2A520EY270ESH ISH ISH ESH

260A E261 E265 E267

262 266INV 271

CBalLza

2 CB266 o;T410 an

`-- --- -----------------'I

EY260

ST^FAN483 L2C® 120V AC >>

EY272

ISH ESHIESHE335 E340

336 341A273LGT484 L2DBATs

CBss6I I

IT340 EYs4344AEY27s a

329 LGT405L2E

ESHEYs44

I ^ I LGT407

_275

TE507

Sansor SymUols _ ----

EVoltage ISHPosition FeedbackST

dISH __ ____ 516 L2F

ESH FReleedback an IT Current TE Temperature

CB20280 EY28sFAN

L2G

FT Flow

Positi LT

XT Ph,aseAngle

24V DC>>DC485 L2HEY284

Figure 3: ADAPT EPS Schematic

Table 1: ADAPT EPS Benchmarking System Charac-teristics

Aspect ADAPT- Lite ADAPT#Comps/Modes 37/93 173/430Initial State Relays Relays

closed; cir- open;cuit breakers circuitclosed breakers

closedNominal mode No Yeschanges?Multiple faults? No Yes

Hardware-Induced Faults: These faults are physi-cally injected at the testbed hardware. A simple ex-ample is tripping a circuit breaker using the manualthrow bars. Another is using the power toggle switchto turn off the inverter. Faults may also be introducedin the loads attached to the EPS. For example, the valvecan be closed slightly to vary the back pressure on thepump and reduce the flow rate.

Software-Induced Faults: In addition to fault in-jection through hardware, faults may be introducedvia software. Software fault injection includes one ormore of the following: 1) sending commands to the

testbed that were not intended for nominal operations;2) blocking commands sent to the testbed; and 3) al-tering the testbed sensor data.

Real Faults: In addition the aforementioned twomethods, real faults may be injected into the systemby using actual faulty components. A simple exam-ple includes a blown light bulb. This method of faultinjection was not used in this study.

For the results presented in this paper, only abruptdiscrete (change in operating mode of component) andparametric (step change in parameter values) faults areconsidered. Distinct faults types that are injected intothe testbed for benchmarking are shown Table 2.

The diagnostic algorithms are tested against a num-ber of diagnostic scenarios consisting of nominal orfaulty data of approximately four minutes in length.Some key points and intervals of a notional sce-nario are illustrated in Figure 4, which splits the sce-nario into three important time intervals: Ostartup,Oinjection, and Oshutdown. During the first inter-val Ostartup, the diagnostic algorithm is given timeto initialize, read data files, etc. Note that this isnot the time for compilation; compilation-based al-gorithms compile their models beforehand. Thoughsensor observations may be available during Ostartup,no faults are injected during this time. Fault injec-tion takes place during Oinjection. Once faults are in-

4

Page 5: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of the Prognostics and Health Management Society, 2009

Table 2: Fault types used for ADAPT and ADAPT-Lite

Component Fault DescriptionBattery Degraded

Boolean Sensor Stuck at Value

Circuit Breaker Failed OpenStuck Closed

Inverter Failed Off

Relay Stuck OpenStuck Closed

Sensor Stuck at ValueOffset

Pump(Load) Flow BlockedFailed Off

Fan(Load)Over Speed

Under SpeedFailed Off

Light Bulb(Load) Failed OffBasic Load Failed Off

jected they persist until the end of the scenario. Multi-ple faults may be injected simultaneously or sequen-tially. Finally, the algorithms are given some timepost-injection to send final diagnoses and gracefullyterminate during Ashutdown. The intervals used in thisstudy are 30 seconds, 3 minutes, and 30 seconds forAstartup, Ainjection, and Ashutdown, respectively.

Below are some notable points for the example di-agnostic scenario from Figure 2:

• tinj - A fault is injected at this time;

• t fd - The diagnostic algorithm has detected afault;

• t f fi - The diagnostic algorithm has isolated afault for the first time;

• t fir - The diagnostic algorithm has modified itsisolation assumption;

• tl fi - This is the last fault isolation duringAinjection .

Figure 4: Key time points, intervals, and signals

Nominal and failure scenarios were created usinghardware and software-induced fault injection meth-

ods according to the intervals specified above and us-ing the faults listed in the fault catalog (Table 2). Asshown in Table 3, nominal scenarios comprise roughlyhalf of the ADAPT-Lite and one-third of the ADAPTtest scenarios. The ADAPT-Lite fault scenarios arelimited to single faults. Half of the ADAPT faults sce-narios are single faults; the others are double or triplefaults.

Table 3: Number of sample and test scenarios forADAPT and ADAPT-Lite

Sample Competition#Scenarios ADAPT ADAPT ADAPT ADAPT

-Lite -LiteNominal 32 39 30 40Single- 27 54 32 40faultDouble- 0 19 0 30faultTriple- 0 1 0 10fault

Diagnostic ChallengesThe ADAPT EPS testbed offers a number of chal-lenges to DAs. It is a hybrid system with multiplemodes of operation due to switching elements suchas relays and circuit breakers. There are continuousdynamics within the operating modes and componentsfrom multiple physical domains, including electrical,mechanical, and hydraulic. It is possible to inject mul-tiple faults into the system. Furthermore, timing con-siderations and transient behavior must be taken intoaccount when designing DAs. For example, whenpower is input to the inverter there is a delay of afew seconds before power is available at the output.For some loads, there is a large current transient whenthe device is turned on. System voltages and cur-rents depend on the loads attached, and noise in sensordata increases as more loads are activated. Measure-ment noise occasionally exhibits spikes and is non-Gaussian. The 2 Hz sample rate limits the types offeatures that may be extracted from measurements. Fi-nally, there may be insufficient information and datato estimate parameters of dynamic models in certainmodeling paradigms.

4.2 Diagnostic MetricsA set of 8 metrics has been defined for assessingthe performance of the diagnostic algorithms. Thisset is categorized as temporal, technical, or compu-tational performance metrics. The temporal metricsmeasure how quickly an algorithm responds to faultsin a physical system. The technical metrics measurenon-temporal features of a diagnostic algorithm likeaccuracy. Finally, computational metrics are intendedto measure how efficiently an algorithm uses the avail-able computational resources.

5

Page 6: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of the Prognostics and Health Management Society, 2009

In addition, we divide the metrics into 2 main cate-gories:

Detection metrics which deal with temporal, tech-nical, and computational metrics associated with onlydetection of the fault.

Isolation metrics which deal with temporal, techni-cal, and computational metrics associated with isola-tion of the fault.

The notation used for the definition of the metrics isas follows:S - The set of scenarios for a given system description

Sn - The set of nominal scenarios for a given systemdescription

S f - The set of faulty scenarios for a given systemdescription

tfd - The time when the fault detection signal has beenasserted for the first time

t fi - The time when the last persistent fault isolationsignal has been asserted

wact - The true component mode vector (ground truth)

wpre - The predicted component mode vector (repre-sents the set of candidate diagnoses by the DA)

Td - Total computation time

Md - Peak amount of allocated memory

Finally, using the aforementioned notation, the 8metrics are defined as:

Fault Detection Time (Mfd): The reaction time fora diagnostic engine in detecting an anomaly (Kurtogluet al., 2008)

Mfd = tfd (1)

Fault Isolation Time (Mfi): The time for isolating afault (Kurtoglu et al., 2008). In many applications thismetric is less important than the diagnostic accuracy,but it is important in sequential diagnosis, probing, etc.

Mfi = tfi (2)

False Positive Rate (Mfp): The metric that penal-izes diagnostic algorithms which announce spuriousfaults (Kurtoglu et al., 2008). The false positive rateis defined as:

Mfp =

^sE

SIS I

mfpfp(

s) (3)

where for each scenario s the “false positive” func-tion mfp (s) is defined as:

"

mfp (s) = 1, if tfd < t inj (4)0, otherwise

where tinj = oo for nominal scenarios.

False Negative Rate (Mfn): The metric that mea-sures the ratio of missed faults by a diagnostic algo-rithm (Kurtoglu et al., 2008).

^3ES mfn (s)

Mfn = (5)ISf I

where for each scenario s the “false negative” func-tion mfn (s) is defined as:

mfn ( sSl

) = r 1, if tfd = 000, otherwise (6)

Detection Accuracy (MFDA): The fault detectionaccuracy is the ratio of number of correctly classifiedcases to the total number of cases (Kurtoglu et al.,2008). It is defined as:

7MFDA =1 —K"S mfp (s) + mfn (s)

( )IS I

Classification Errors (Mia): Isolation classificationerror metric measures the accuracy of the fault isola-tion by a diagnostic algorithm and is defined as theHamming distance (The Hamming distance betweentwo strings of data values is the number of positionsfor which the corresponding data values are different)between the true component mode vector wact and thepredicted component mode vector wpre.

In the calculation of the classification error metric,the data values for the Hamming distance are the re-spective modes of components comprising a systemdescription. For example, if the true component modevector of the system is [1,0,0,1,0] and the predictedcomponent mode vector is [1,1,0,0,0], the classifica-tion error is 2. If more than one predicted mode vec-tor is reported by a DA, (meaning that the diagnosticoutput consists of a set of candidate diagnoses), thenthe classification error is calculated for each predictedcomponent mode vector and weighted by the candidateprobabilities reported by the DA.

CPU Load (Mc pu): This is the average CPU loadduring the experiment

Mcpu = ts + E q (8)gETd

where ts is the startup time of the diagnostic engineand Td is a vector with the actual CPU time spent bythe diagnostic algorithm at every time step in the diag-nostic session.

6

Page 7: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of the Prognostics and Health Management Society, 2009

Memory Load (Mmem): This is the maximummemory size at every step in the diagnostic session.CPU load during the experiment

Mmem = max m (9)m∈ Md

where Md is a vector with the maximum memorysize at every step in the diagnostic session.

4.3 Diagnostic AlgorithmsA total of ten DAs were tested, nine in full ADAPTand six in ADAPT-Lite. Brief descriptions of each ofthese algorithms are provided below.

1. FACT - a model-based diagnosis system that useshybrid bond graphs, and models derived fromthem, at all levels of diagnosis, including faultdetection, isolation, and identification. Faultsare detected using an observer-based approachwith statistical techniques for robust detection.Faults are isolated by matching qualitative devia-tions caused by fault transients to those predictedby the model. For systems with few operatingconfigurations, fault isolation is implemented ina compiled form to improve performance (Roy-choudhury et al., 2009).

2. Fault Buster - is based on a combination of multi-variate statistical methods, for the generation ofresiduals. Once the detection has been done aneural network performs classification for doingisolation.

3. HyDE-A - HyDE (Hybrid Diagnosis Engine) isa model-based diagnosis engine that uses con-sistency between model predictions and obser-vations to generate conflicts which in turn drivethe search for new fault candidates. HyDE-Auses discrete models of the system and a dis-cretization of the sensor observations for diagno-sis (Narasimhan and Brownston, 2007).

4. HyDE-S - uses the HyDE system but runs it oninterval values hybrid models and the raw sensordata (Narasimhan and Brownston, 2007).

5. ProADAPT - processes all incoming environmentdata (observations from a system being diag-nosed), and acts as a gateway to a probabilisticinference engine. It uses the Arithmetic Circuit(AC) Evaluator which is compiled from Bayesiannetwork models. The primary advantage to usingACs is speed, which is key in resource boundedenvironments (Mengshoel, 2007).

6. RacerX - is a detection-only algorithm which de-tects a percentage change in individual filteredsensor values to raise a fault detection flag.

7. RODON - is based on the principles of the Gen-eral Diagnostic Engine (GDE) as described by deKleer and Williams and the G+DE by Heller andStruss. RODON uses contradictions (conflicts)

between the simulated and the observed behav-ior to generate hypotheses about possible causesfor the observed behavior. If the model containsfailure modes besides the nominal behavior, thesecan be used to verify the hypotheses, which speedup the diagnostic process and improve the results(Karin et al., 2006).

8. RulesRule - is a rule-based isolation-only algo-rithm. The rule base was developed by analyz-ing the sample data and determining character-istic features of fault. There is no explicit faultdetection though isolation implicitly means that afault has been detected.

9. StanfordDA - is an optimization-based approachto estimating fault states in a DC power system.The model includes faults changing the circuittopology along with sensor faults. The approachcan be considered as a relaxation of the mixed es-timation problem. We develop a linear model ofthe circuit and pose a convex problem for estimat-ing the faults and other hidden states. A sparsefault vector solution is computed by using l1 reg-ularization (Zymnis et al., 2009).

10. Wizards of Oz - is a consistency-based algorithm.The model of the system completely defines thestable (static) output of the system in case of nor-mal and faulty behavior. Given a new commandor new observations, the algorithm waits for a sta-ble state and computes the minimum diagnosesconsistent with the observations and the previousdiagnoses.

5 RESULTS AND DISCUSSIONThe benchmarking results for ADAPT-Lite andADAPT are shown in Tables 4 and 5, respectively.Figures 5 - 8 are graphical depictions of data in Table4 and Figures 13 - 16 are graphical depictions of datain Table 5. No DA dominated over all metrics usedfor benchmarking. For ADAPT-Lite, seven of the ninealgorithms tested were best or second best with respectto at least one of the metrics. For ADAPT, five of sixalgorithms fit this description.

We discuss the ADAPT-Lite results next. Figure 5shows the false positive rate, false negative rate, anddetection accuracy. As is evident from the definitionof the metrics in section 4.2, an algorithm that haslower false positive and negative rates will have higherdetection accuracy. False positives were counted in thefollowing two situations: for nominal scenarios wherethe DA declared a fault; and for faulty scenarios wherethe DA declared a fault before any fault was injected.An error in the rule base of RulesRule led to more falsepositive indications for the faulty scenarios than for thenominal scenarios and also resulted in a large numberof classification errors. Other false positives were dueto noise in the data. Figure 9 shows a nominal runin which a current sensor exhibited a spike at approx-imately 73 seconds. Seven out of nine DAs issued a

7

Page 8: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

1.0

0.9

0.8

0.7

0.6

0.5`o8 0.4

0.3

0.2

0.1

0.0

T_det

n T_iso

30000

25000

20000

IiS

f

15000TrEy

n Errors F 10000

5000

0

80

70

P'^ 6003 50c

u 40

030

E

Z 2090

10

0

Mean CPU Time (ms)

s Mean Peak MemoryUsage (kb)

Annual Conference of the Prognostics and Health Management Society, 2009

16000

14000

12000

10000

8000

0 FP Rate E

n FN Rate6000

n Det Acc 4000

2000

0

Figure 5: ADAPT-Lite false positive rate, false nega- Figure 7: ADAPT-Lite detection and isolation timestive rate, and detection accuracy by DA. by DA.

Figure 6: ADAPT-Lite classification errors by DA.

false positive for this run, only HyDE-A and Wizardsof Oz did not declare a fault.

Many false negatives were caused by scenarios inwhich a sensor reading was stuck within the nominalrange of the sensor. Figure 10 shows a voltage sensorthat freezes its reading at 118 seconds. Only ProAD-APT correctly detected this fault.

The number of classification errors for each DA isshown in Figure 6. ProADAPT was the only DA tocorrectly detect and classify sensor-stuck faults of thetype shown in Figure 10. Furthermore, aside fromone run, it also correctly distinguished between sensor-stuck and sensor-offset faults, examples of which areshown in Figure 11. The distinction in the fault be-havior is that stuck has zero noise while offset hasthe noise of the original signal. Also, the sensor-stuckfaults were set to the minimum or maximum value ofthe sensor or held at its last reading. However, wedid not specify to DA developers that the stuck val-ues would be limited to these cases. Figure 12 showsnumber of faults injected and aggregate classificationerrors for all DAs by component/fault type. The cate-gory ‘Sensor’ includes faults in current, voltage, tem-

Figure 8: ADAPT-Lite CPU time and peak memoryusage by DA.

perature, phase angle, and switch position sensors. Be-cause of this grouping, the figure shows more sensorfaults injected than other types. The number of classi-fication errors in sensor fault scenarios is quite high.Part of the reason is due to the fact that most DAsreported either stuck or offset consistently or they re-ported both with equal weight. In this benchmarkingstudy, no partial credit was given for correctly namingthe failed component but incorrectly isolating the fail-ure mode. We realize however, that isolating to a failedcomponent or line-replaceable-unit (LRU) in mainte-nance operations is sometimes all that is required. Weplan to revisit this metric in future benchmarking work.

In several instances DAs reported component modeIDs which did not match the names specified in thesystem catalog. For these cases the diagnosis wastreated as an empty candidate. This could either nega-tively or positively impact the classification error met-ric depending on whether the DA had a correct or in-correct isolation.

The detection and isolation times are shown in Fig-ure 7. RacerX did not have an isolation time asit was a detection-only DA. Second, note the some-

8

Page 9: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

50 100 150 200 250Time (s)

900

800

700

60020.

500

d

N 400cNLL 300

200

100

00 50 100 150 200 250

Time (s)

#Faults Injected

i # Errors

23.02

23

22.98

S 22.96d

j 22.94

22.92

22.9

80

I?`u 70

0$ 60C

50

Oe 40O

30

20

0u 10

Ez' 0

Annual Conference of the Prognostics and Health Management Society, 2009

Figure 9: A nominal run with spike in sensor IT240,battery 2 current.

Figure 11: Fan speed sensor ST516 with sensor-offsetand sensor-stuck faults.

22.880 50 100 150 200 250

Time (s)

Figure 10: An example of sensor-stuck failure modefor voltage sensor E261, the downstream voltage ofrelay EY260.

what confusing result that the mean isolation time forRulesRule was less than the mean detection time. Thishas to do with the way the metrics are calculated. Thedetection time is undefined for scenarios with a falsepositive; however, the isolation time is not necessar-ily undefined and is calculated as discussed in section4.2. The intent is to account for the situation wherea DA retracts a spurious detection signal and subse-quently isolates to the correct component. In this casethe scenario is declared a false positive but the accu-racy and timing of the isolation are calculated with re-spect to the last persistent diagnosis. Consequently,for DAs with many false positives the detection timemay be calculated for fewer scenarios than the isola-tion time with the result that the mean isolation timefor all scenarios could be less than the mean detectiontime. However, in any scenario where both times aredefined, the DA isolation time is always greater thanor equal to the detection time, as would be expected.

Note that the same DA was implemented by two

Figure 12: ADAPT-Lite classification errors by com-ponent/fault type.

different modelers for ADAPT-Lite. HyDE-A wasmodeled primarily with the larger and more com-plex ADAPT in mind and had a policy of waitingfor transients to settle before requesting a diagnosis.The same policy was applied to ADAPT-Lite as well,even though transients in ADAPT-Lite correspondedstrictly to fault events; this prevented false positivesin ADAPT but negatively impacted the timing met-ric in ADAPT-Lite. On the other hand, HyDE-S wasmodeled only for ADAPT-Lite and did not include alengthy time-out period for transients to settle. HyDE-S had dramatically smaller mean detection and isola-tion times (Figure 7) with roughly the same numberof classification errors (Figure 6) as HyDE-A. This il-lustrates the impact that modeling and implementationdecisions have on DA performance. While this givessome insight into trade-offs present in building mod-els, in this work we did not define metrics that directlyaddress the ease or difficulty of building models of suf-ficient fidelity for the diagnosis task at hand.

Figure 8 shows the CPU and memory usage.Note that significant differences were evident in the

9

Page 10: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

T_det

T_iso

1.0

0.9

0.8

0.7

0.6tla 0.5

u 0.4

0.3

0.2

0.1

0.0

* FP Rate

n FN Rate

n Det Acc

90000 T

80000

70000

60000f

50000`o

E 40000

E 30000ti20000

10000

n Errors

160

140

0120

100

80

YE 60i

s40

20

180

Annual Conference of the Prognostics and Health Management Society, 2009

peak memory usage metric when run on Linux ver- 60000

sus WindowsTM . The cause for this was not explored 50000

due to time constraints, as the method used on Win-40000

dows for calculating peak memory usage involved aWindowsTMAPI system call, the analysis of which was

E 30000

deemed too expensive. RODON was the only Java DA 20000

that was run on Windows, which adversely affected itsmemory usage metric. 10000

We now discuss the benchmarking results for the 0

larger ADAPT system. Figure 13 shows the false pos-itive rate, false negative rate, and detection accuracy.

Figure 13: ADAPT false positive rate, false negativerate, and detection accuracy by DA.

Figure 15: ADAPT detection and isolation times byDA.

O Mean CPU Time (ms)

Y Mean Peak MemoryUsage (kb)

Figure 14: ADAPT classification errors by DA.

The comments in the ADAPT-Lite discussion aboutnoise and sensor stuck apply here as well. Addition-ally, false positives also resulted from nominal com-manded mode changes in which the relay feedback didnot change status as of the next data sample after thecommand. Here is an extract from one of the inputscenario files that illustrates this situation:

command @120950 EY275CL = false;sensors @121001 {...ESH275 = true, ...}

sensors @121501 {...ESH275 = false, ...}

A command is given at 120.95 seconds to open re-lay EY275. The associated relay position sensor doesnot indicate open as of the next sensor data update 51

Figure 16: ADAPT CPU time and peak memory usageby DA.

milliseconds later. This is nominal behavior for thesystem. A DA that does not account for this delay willindicate a false positive in this case.

The number of classification errors for each DA isshown in Figure 14. There were significantly moreerrors when compared to ADAPT-Lite. One reason issimply because there were more fault scenarios over-all, resulting in more than four times as many faults in-jected in ADAPT compared to ADAPT-Lite. Anotherreason is because of the presence of multiple faults.Figure 17 shows the breakdown of classification errorsby the number of faults in the scenario. The errors inthe no-fault scenarios were obviously due to false pos-itives. The double and triple faults are lumped into onemultiple fault category. Note that there were an equalnumber of single fault and multiple fault scenarios, 40.Counting the number of faults injected in the scenar-ios, there were 40 total faults in the single fault scenar-ios and 90 total faults in the multiple fault scenarios.A DA that did not return any diagnoses at all wouldhave 40 errors for single fault case and 90 errors formultiple faults. This is essentially what happened withFaultBuster because the diagnoses returned were notconsistent with the fault catalog and they were treatedas empty candidates (the same was true for FaultBuster

10

Page 11: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

100

90

80

70

60

50

40

30

20

10

0 # Faults Injectedn # Errors

Annual Conference of the Prognostics and Health Management Society, 2009

ADAPT-Lite). Figure 18 shows the errors by faulttype.

Errors‐ No Fault

* Errors‐Single Fault

n Errors‐ Multiple Faults

Figure 17: ADAPT classification errors per number offaults by DA.

250

eo200

tCR 150

0a 100Gz

50

E 0

Z

Figure 18: ADAPT classification errors by fault type.

The errors in the multiple fault scenarios wereevenly divided among the faults; for example, if therewere four classification errors in a scenario where twofaults were injected, each fault was assigned two er-rors. We also did a more thorough assessment in whicheach diagnosis candidate was examined and classifi-cation errors were assigned to fault categories basedon an understanding of which sensors are affected bythe faults. The results are similar to evenly dividingthe errors among the faults and are not shown here.The category ‘Loads’ includes faults in the fan, lights,pumps, and resistors. The category ‘Sensor’ includesfaults in current, voltage, light, temperature, phase an-gle, and switch position sensors. Once again, therewere many errors due to not distinguishing betweenoffset and stuck faults and not detecting stuck faultsthat were within the nominal range of the sensor.

Figure 15 shows the detection and isolation times.The times are within the same order of magnitude forthe different DAs. Some DAs have isolation times thatare similar to its detection times while others showisolation times that are much greater than the detec-tion times. This could reflect differences in reasoning

strategies or differences in policies for when to declarean isolation based on accumulated evidence.

The CPU and memory usage are shown in Fig-ure 16. The same comment for RODON mentionedpreviously in regards to memory usage applies here.The convex optimization approach applied in the Stan-ford DA(Gorinevsky and Poll, 2009) and the compiledarithmetic circuit in the ProADAPT DA (Ricks andMengshoel, 2009) lead to very low CPU usages.

6 LIMITATIONS AND FUTURE WORK

We would like to extend the benchmarking effortsin several directions. Some obvious candidates areadding new systems, new DAs, and new metrics to thescope. The primary goal of the work described in thispaper was to benchmark the available set of DAs on theADAPT system. As a result we made several simpli-fying assumptions. We also ran into several issues dur-ing the course of this implementation that could not beaddressed. In this section, we try to present those as-sumptions and issues, which we hope can be addressedin future work.

6.1 ADAPT System Scenarios

We intentionally limited the scope of the scenariosused in the benchmarking in order to focus on theperformance of the DAs. Although we would liketo include variable loads in future experiments, therewere none used in this benchmarking activity. We lim-ited ourselves to abrupt parametric and discrete faults.Faults were inserted assuming uniform probabilitiesand included component and sensor faults only. Weplan to introduce other fault types (incipient, intermit-tent, and noise) in the future. We also understand thata true test would simulate operating conditions of thereal system, i.e. the system operates nominally forlong periods of time and failures occur periodicallyfollowing the prior probability of failure distribution.It was also assumed that all sensor data was availableto the DAs at all time steps. In the future we wouldlike to relax this assumption and provide only a subsetof the sensor data, possibly at differing sampling rates.

6.2 Metrics

Selecting the set of metrics to be used for benchmark-ing was a challenging job. We based our decision onthe system and kinds of faults we were dealing with. Inreality we also need to design metrics more closely as-sociated with the context of use. One common metricis to minimize total cost of repair where cost includesdown time to the customer, diagnostician’s time, parts,etc. In addition, since we were dealing with abrupt,persistent, and discrete faults, metrics associated withincipient, intermittent, and/or continuous faults werenot considered. The metrics listed in this paper do notcapture the amount of effort necessary to build modelsof sufficient fidelity for the diagnosis task at hand. Fur-thermore, we did not attempt to investigate the ease or

11

Page 12: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of the Prognostics and Health Management Society, 2009

Table 4: ADAPT-Lite DA Results

RODON WizardsOf Oz

FaultBuster

Pro-ADAPT

HyDE-A

HyDE-S

Rules-Rule

FACT RacerX

FP Rate 0.0645 0.0000 0.1333 0.0333 0.0000 0.2000 0.8246 0.2813 0.0645FN Rate 0.0968 0.5000 0.3438 0.0313 0.4688 0.0741 0.0000 0.0667 0.1613Det Acc 0.9194 0.7419 0.7581 0.9677 0.7581 0.8548 0.2419 0.8226 0.8871Class Errors 10.000 24.000 32.000 2.000 26.649 26.000 76.000 25.000 32.000Tdet (ms) 218 11530 1893 1392 13223 130 1000 373 126T iso(ms) 7205 11626 9259 4084 13840 653 282 9796 999999CPU(ms) 11766 1039 2039 1601 24795 513 117 1767 139Mem(kb) 26679 1781 2539 1680 5447 5795 3788 4340 3572

Table 5: ADAPT DA Results

RODON WizardsOf Oz

FaultBuster

Pro-ADAPT

HyDE Stanford

FP Rate 0.5417 0.5106 0.8143 0.0732 0.0000 0.3256FN Rate 0.0972 0.0959 0.2400 0.1392 0.3000 0.0519Det Acc 0.7250 0.7417 0.4250 0.8833 0.8000 0.8500Class Errors 84.067 159.248 130.000 76.000 121.569 110.547Tdet (ms) 3490 30742 14099 5981 17610 3946T iso (ms) 36331 47625 37808 12486 21982 14103CPU(ms) 80261 23387 5798 3416 29612 963Mem(kb) 29878 7498 10261 6539 20515 5912

difficulty of updating models with new or changed sys-tem information. The art of building models is an im-portant practical consideration which is not addressedin the current work.

In future work, we would like to determine a setof application-specific use cases (maintenance, au-tonomous operation, abort decision etc.) that the DAis supporting and select metrics that would be relevantto that use case. For example, in an abort decision-support use case the detection time would be of utmostimportance, however in an autonomous operation usecase isolation accuracy would be more important.

6.3 Runtime Architecture

Some practical issues arose in the execution of exper-iments. Much effort was put into ensuring stable, uni-form conditions on the host machines; however, dueto time constraints and the unpredictable element in-troduced by running DAs developed externally, it wasnecessary to take measures that may have caused slightvariability.

7 CONCLUSION

We presented the results from our effort to benchmarka set of diagnostic algorithms on the ADAPT electri-cal power system testbed. We learned some valuablelessons in trying to complete this effort. One majortakeaway is that there is still a lot of work and dis-cussion needed to determine a common comparisonand evaluation framework for the diagnosis commu-

nity. The other key observation is that no DA was ableto be best in a majority of the metrics. This clearlyindicates that the selection of DAs would necessarilyinvolve a trade-off analysis between various prefor-mance metrics.

As outlined in the previous section we have iden-tified some of the limitations of our work. We arealready in the process of planning continued exper-iments on the ADAPT system. We are also tryingto identify other real systems that might be used forbenchmarking since we believe that the performanceof the DAs is also tied the system being diagnosed.Our long-term goal is to create a database of bench-marking results which presents the performance of agrowing set of DAs on a growing set of applicationdomains. We hope that such a database would helpmission managers and operations personnel performtrade-off studies to determine which DAs they couldconsider for deployment on their systems.

ACKNOWLEDGMENTS

We extend our gratitude to Johan de Kleer (PARC),Alexander Feldman (Delft University of Technology),Lukas Kuhn (PARC), Serdar Uckun (PARC), PeterStruss (Technical University Munich), Gautam Biswas(Vanderbilt University), Ole Mengshoel (CarnegieMellon University), Kai Goebel (University Space Re-search Association), Gregory Provan (University Col-lege Cork), and many others for valuable discussionsin establishing the benchmarking framework. In ad-

12

Page 13: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of the Prognostics and Health Management Society, 2009

dition, we extend our gratitude to David Nishikawa(NASA), David Jensen (Oregon State University),Brian Ricks (University of Texas at Dallas), AdamSweet (NASA), David Hall (Stinger Ghaffarian Tech-nologies), and many others for supporting the bench-marking activity reported here.

(Bartys et al., 2006) M. Bartys, R. Patton, M. Syfert,S. de las Heras, and J. Quevedo. Introduction to theDAMADICS actuator FDI benchmark study. 2006.

(Basseville and Nikiforov, 1993) M. Basseville andI. Nikiforov. Detection of Abrupt Changes.Prentice-Hall, Inc., Englewood Cliffs, NJ, 1993.

(de Freitas, 2001) N. de Freitas. Rao-blackwllisedparticle filtering for fault diagnosis. In Proceedingsof IEEE Aerospace Conference (AEROCONF’01 ),2001.

(Gertler and Inc., 1998) Janos J. Gertler and NetLi-brary Inc. Fault detection and diagnosis in engi-neering systems[electronic resource]. New York:Marcel Dekker, 1998.

(Gorinevsky and Poll, 2009) D. Gorinevsky andS. Poll. Estimation of faults in dc electrical powersystem. In Proceedings of American ControlConference, 2009.

(Hamscher et al., 1992) W. Hamscher, L. Console,and J. de Kleer. Readings in Model-Based Diag-nosis. Morgan Kaufmann, San Mateo, Ca, 1992.

(Iverson, 2004) D. Iverson. Inductive system healthmonitoring. In Proceedings of The 2004 Interna-tional Conference on Artificial Intelligence (ICAI),2004.

(Karin et al., 2006) L. Karin, R. Lunde, andB. Munker. Model-based failure analysis withRODON. In Proceedings 17th European Con-ference on Artificial Intelligence (ECAI’06),2006.

(Kavcic and Juricic, 1997) M. Kavcic and D. Juricic.A prototyping tool for fault tree based process di-agnosis. In Proceedings of 8th International Work-shop on Principles of Diagnosis (DX’1997), 1997.

(Kostelezky et al., 1990) W. Kostelezky, W. Krautter,R. Skuppin, M. Steinert, and R. Weber. The rule-based expert system Promotex I. Technical Re-port 2, ESPRIT-Project #1106, Stuttgart, 1990.

(Kurtoglu et al., 2008) T. Kurtoglu, O. J. Mengshoel,and S. Poll. A framework for systematic bench-marking of monitoring and diagnostic systems. InAnnual Conference of the Prognostics and HealthManagement Society (PHM’08), 2008.

Gemund, and A. Feldman. A framework for sys-tematic benchmarking of monitoring and diagnos-tic systems. In Proceedings of 20th InternationalWorkshop on Principles of Diagnosis (DX’09),pages 383–396, 2009.

(Kurtoglu et al., 2009b) T. Kurtoglu, S. Narasimhan,S. Poll, D. Garcia, L. Kuhn, J. de Kleer, A. vanGemund, and A. Feldman. Towards a frameworkfor evaluating and comparing diagnosis algorithms.In Proceedings of 20th International Workshop onPrinciples of Diagnosis (DX-09), pages 373–382,2009.

(Lerner et al., 2000) U. Lerner, R. Parr, D. Koleer,and G. Biswas. Bayesian fault detection and diag-nosis in dynamic systems. In Proceedings of TheSeventeenth National Conference on Artificial In-telligence (AAAI’00), pages 531–537, 2000.

(Mengshoel, 2007) O. J. Mengshoel. Designingresource-bounded reasoners using bayesian net-works: System health monitoring and diagnosis.In Proceedings of 18th International Workshop onPrinciples of Diagnosis (DX’07), pages 330–337,2007.

(Narasimhan and Brownston, 2007) S. Narasimhanand L. Brownston. HyDE - a general frameworkfor stochastic and hybrid modelbased diagnosis.In Proceedings of 18th International Workshop onPrinciples of Diagnosis (DX’07), pages 162–169,2007.

(Orsagh et al., 2002) R. Orsagh, M. Roemer, C. Sav-age, and M. Lebold. Development of preformanceand effectiveness metrics for gas turbine diagnos-tic techniques. In Proceedings of IEEE AerospaceConference (AEROCONF’02), pages 2825–2834,2002.

(Poll et al., 2007) S. Poll, A. Patterson-Hine,J. Camisa, D. Garcia, D. Hall, C. Lee, O. J.Mengshoel, C. Neukom, D. Nishikawa, J. Os-senfort, A. Sweet, S. Yentus, I. Roychoudhury,M. Daigle, G. Biswas, and X. Koutsoukos. Ad-vanced diagnostics and prognostics testbed. InProceedings of 18th International Workshop onPrinciples of Diagnosis (DX’07), 2007.

(Ricks and Mengshoel, 2009) B. Ricks and O. Meng-shoel. The diagnostic challenge competition: Prob-abilistic techniques for fault diagnosis in electri-cal power systems. In Proceedings of 20th In-ternational Workshop on Principles of Diagnosis(DX’09), pages 415–422, 2009.

(Roemer et al., 2005) M. Roemer, J. Dzakowic,R. Orsagh, C. Byington, and G. Vachtsevanos.Validation and verification of prognostic healthmanagement technologies. In Proceedings of IEEEAerospace Conference (AEROCONF’05), 2005.

(Kurtoglu et al., 2009a) T. Kurtoglu, S. Narasimhan, (Roychoudhury et al., 2009) I. Roychoudhury,

S. Poll, D. Garcia, L. Kuhn, J. de Kleer, A. van G. Biswas, and X. Koutsoukos. Designing

13

Page 14: Benchmarking Diagnostic Algorithms on an Electrical Power ...€¦ · Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed Tolga Kurtoglu 1 , Sriram Narasimhan

Annual Conference of the Prognostics and Health Management Society, 2009

distributed diagnosers for complex continuoussystems. 2009.

(Simon et al., 2008) L. Simon, J. Bird, C. Davison,A. Volponi, and R. E. Iverson. Benchmarking gaspath diagnostic methods: A public approach. InProceedings of the ASME Turbo Expo 2008: Powerfor Land, Sea and Air, GT, 2008.

(Society of Automotive Engineers, 2007) E-32 Soci-ety of Automotive Engineers, 2007. Health and Us-age Monitoring Metrics, Monitoring the Monitor,February 14, 2007, SAE ARP 5783-DRAFT.

(Sorsa and Koivo, 1998) T. Sorsa and H. Koivo. Ap-plication of artificial neural networks in processfault diagnosis. 29(4):843–849, 1998.

(Zymnis et al., 2009) A. Zymnis, S. Boyd, andD. Gorinevsky. Relaxed maximum a posteriori faultidentification. 2009.

Tolga Kurtoglu is a Research Scientist with MissionCritical Technologies at the Intelligent Systems Divi-sion of the NASA Ames Research Center working forthe Systems Health Management group. His researchfocuses on the development of prognostic and healthmanagement systems, model-based diagnosis, designautomation and optimization, and risk and reliabilitybased design. He received his Ph.D. in MechanicalEngineering from the University of Texas at Austinin 2007 and has an M.S. degree in the same fieldfrom Carnegie Mellon University. Dr. Kurtogluhas published over 40 articles and papers in variousjournals and conferences and is an active memberof ASME, ASEE, AIAA, and AAAI. Prior to hiswork with NASA, he worked as a professional designengineer at Dell Corporation in Austin, Texas.

Sriram Narasimhan is a Computer Scientistwith University of California, Santa Cruz workingas a contractor at NASA Ames Research Center inthe Discovery and Systems Health area. His researchinterests are in model-based diagnosis with a focus onhybrid and stochastic systems. He is the technical leadfor the Hybrid Diagnosis Engine (HyDE) project. Hereceived his M.S and Ph.D. in Electrical Engineeringand Computer Science from Vanderbilt University. Healso has a M.S in Economics from Birla Institute ofTechnology and Science.

Scott Poll is a Research Engineer with the NationalAeronautics and Space Administration (NASA) AmesResearch Center, Moffett Field, CA, where he is thedeputy lead for the Diagnostics and Prognostics Groupin the Intelligent Systems Division. He is co-leadingthe evolution of a laboratory designed to enablethe development, maturation, and benchmarking ofdiagnostic, prognostic, and decision technologiesfor system health management applications. He waspreviously the Associate Principal Investigator forPrognostics in the Integrated Vehicle Health Manage-

ment Project in NASA’s Aviation Safety Program. Hereceived the BSE degree in Aerospace Engineeringfrom the University of Michigan in 1994, and theMS degree in Aeronautical Engineering from theCalifornia Institute of Technology in 1995.

David Garcia is a Computer Programmer withStinger Ghaffarian Technologies, working at NASAAmes Research Center in the Diagnostics and Prog-nostics Group (Intelligent Systems Division). Hereceived his BS in Mathematics from Santa ClaraUniversity in 2006. He is the software lead for theADAPT project, and designed and implemented theDXC Framework.

Stephanie Wright obtained her BS degree inPhysics from Seattle University in 2008 and is cur-rently a graduate student at Vanderbilt University inthe EECS department. At Vanderbilt University sheis a research assistant in the MACS (Modeling andAnalysis of Complex Systems) group and her researchinterests are in diagnostics and health management ofcomplex systems.

14