fault diagnosis in discrete event systems modeled by ... · puter systems, and communication...

22
Noname manuscript No. (will be inserted by the editor) Fault Diagnosis in Discrete Event Systems Modeled by Partially Observed Petri Nets Yu Ru · Christoforos N. Hadjicostis Received: date / Accepted: date Abstract In this paper, we study fault diagnosis in discrete event systems modeled by partially observed Petri nets, i.e., Petri nets equipped with sensors that allow obser- vation of the number of tokens in some of the places and/or partial observation of the firing of some of the transitions. We assume that the Petri net model is accompanied by a (possibly implicit) description of the likelihood of each firing sequence. Faults are modeled as unobservable transitions and are divided into different types. Given an ordered sequence of observations from place and transition sensors, our goal is to cal- culate the belief (namely, the degree of confidence) regarding the occurrence of faults belonging to each type. To handle information from transition and place sensors in a unified manner, we transform a given partially observed Petri net into an equivalent (as far as state estimation and fault diagnosis is concerned) labeled Petri net (i.e., a Petri net with only transition sensors), and construct a translator that translates the sensing information from place and transition sensors into a sequence of labels in the equivalent labeled Petri net. Then we focus on the computation of beliefs on faults in a given labeled Petri net and construct an online monitor that recursively produces these This work was supported in part by the National Science Foundation (USA) under NSF ITR Award 0426831. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of NSF. Yu Ru Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL, USA E-mail: [email protected] Christoforos N. Hadjicostis Department of Electrical and Computer Engineering University of Cyprus and University of Illinois at Urbana-Champaign Address for Correspondence University of Cyprus 75 Kallipoleos Ave P.O.Box 20537 1678 Nicosia, Cyprus Tel: +357 22-892231 Fax: +357 22-892260 E-mail: [email protected]

Upload: others

Post on 19-Nov-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

Noname manuscript No.(will be inserted by the editor)

Fault Diagnosis in Discrete Event Systems Modeled byPartially Observed Petri Nets

Yu Ru · Christoforos N. Hadjicostis

Received: date / Accepted: date

Abstract In this paper, we study fault diagnosis in discrete event systems modeledby partially observed Petri nets, i.e., Petri nets equipped with sensors that allow obser-vation of the number of tokens in some of the places and/or partial observation of thefiring of some of the transitions. We assume that the Petri net model is accompaniedby a (possibly implicit) description of the likelihood of each firing sequence. Faultsare modeled as unobservable transitions and are divided into different types. Given anordered sequence of observations from place and transition sensors, our goal is to cal-culate the belief (namely, the degree of confidence) regarding the occurrence of faultsbelonging to each type. To handle information from transition and place sensors in aunified manner, we transform a given partially observed Petri net into an equivalent(as far as state estimation and fault diagnosis is concerned) labeled Petri net (i.e., aPetri net with only transition sensors), and construct a translator that translates thesensing information from place and transition sensors into a sequence of labels in theequivalent labeled Petri net. Then we focus on the computation of beliefs on faults in agiven labeled Petri net and construct an online monitor that recursively produces these

This work was supported in part by the National Science Foundation (USA) under NSF ITRAward 0426831. Any opinions, findings, and conclusions or recommendations expressed in thispublication are those of the authors and do not necessarily reflect the views of NSF.

Yu RuDepartment of Electrical and Computer EngineeringUniversity of Illinois at Urbana-ChampaignUrbana, IL, USAE-mail: [email protected]

Christoforos N. HadjicostisDepartment of Electrical and Computer EngineeringUniversity of Cyprus and University of Illinois at Urbana-ChampaignAddress for CorrespondenceUniversity of Cyprus75 Kallipoleos AveP.O.Box 205371678 Nicosia, CyprusTel: +357 22-892231Fax: +357 22-892260E-mail: [email protected]

Page 2: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

2

beliefs by tracking the existence of faulty transitions in execution paths that match thesequence of labels observed so far. Using the transformed Petri net and the translatedobservation sequence, we can then compute the belief for each fault type in partiallyobserved Petri nets in the same way as in labeled Petri nets.

Keywords Discrete Event Systems · Petri Nets · Fault Diagnosis · Partial Observation

1 Introduction

A discrete event system (DES) is a dynamic system that evolves in accordance with theabrupt occurrence, at possibly unknown and irregular intervals, of physical events (Ra-madge and Wonham, 1989; Cassandras and Lafortune, 2008). Such systems arise in avariety of contexts, ranging from manufacturing and robotics to vehicular traffic, com-puter systems, and communication networks. As DESs become more complicated andwidespread, failures appear more often and then consequences become potentially se-vere; as a result, fault diagnosis has emerged as an extremely important task in manyreal applications.

One of most extensively studied fault models is the one where faults are modeled asunobservable events, or unobservable state transitions. Following this fault model, theauthors of (Sampath et al, 1995) focused on DESs modeled by finite state machines,introduced the notions of fault types and diagnosability, and designed fault diagnosersto test for diagnosability and implement online fault diagnosis. Later on, their workwas extended to DESs modeled by Petri nets (Ushio et al, 1998; Chung, 2005; Gencand Lafortune, 2007). Specifically, the authors of (Ushio et al, 1998; Chung, 2005)constructed fault diagnosers assuming the observation of marking variations in certainplaces; they followed an approach similar to the one in (Sampath et al, 1995) so thatprevious results could be directly applied. In (Genc and Lafortune, 2007), a distributedversion of the diagnoser approach by Sampath et al. was proposed for place-borderedPetri nets. In contrast to these direct applications of the diagnoser approach of (Sam-path et al, 1995), Giua et al. constructed basis reachability trees for bounded labeledPetri nets (based on the notions of basis markings and justifications) so that faultscan be detected (Giua and Seatzu, 2005). In (Lefebvre and Delherm, 2007), minimaldiagnosers (that use observations from a minimum number of observable places) areconstructed to immediately detect and isolate the firing of fault transitions. There arealso other fault diagnosis methods (for different fault models) that are based on alge-braic coding techniques (Hadjicostis and Verghese, 1999; Wu and Hadjicostis, 2005),net unfolding techniques (Aghasaryan et al, 1998; Benveniste et al, 2003b), interpretedPetri net formulations (Ramırez-Trevino et al, 2007), and others.

In this paper, we adopt the above mentioned fault model where faults are modeledas unobservable state transitions. More specifically, we consider fault diagnosis in par-tially observed Petri nets, i.e., Petri nets with both place sensors, which can measurethe number of tokens in some of the places (and can therefore give marking variationsas required in (Ushio et al, 1998; Chung, 2005)), and transition sensors (in the form oflabels) which can (possibly partially) indicate the firing of some of the transitions. Wehandle place sensors and transition sensors in a unified manner by transforming thepartially observed Petri net into a labeled Petri net (i.e., a Petri net with only transitionsensors) and by constructing a translator to translate the sensing information from theplant (which includes both marking variations and transition labels) into a sequence

Page 3: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

3

of labels; thus, unlike (Ushio et al, 1998; Chung, 2005), we do not use marking vari-ations directly. Once this reduction is established, we focus on labeled Petri nets andconstruct an online monitor to compute the belief we have regarding the occurrence ofeach fault type. Given a sequence of observations, the belief of a particular fault typeis a measure of our confidence regarding the occurrence of faults of that type and isdefined as the ratio of the sum of the weights of possible paths1 that contain that faulttype over the sum of the weights of all possible paths. The weight of a path can beviewed as a measure of its likelihood and can be a function of the transitions involved,the times at which they occur or other factors. When a path’s weight is taken to be theproduct of the weights of its individual transitions, we show that the proposed monitorcan be implemented recursively with complexity that is polynomial in the length ofthe observed sequence of labels.

The contributions of this paper can be summarized as follows: i) fault diagnosis isstudied in a setting more general than in (Ushio et al, 1998; Chung, 2005; Giua andSeatzu, 2005; Lefebvre and Delherm, 2007); ii) a transformation scheme is proposed forpartially observed Petri nets so that marking variations can be treated as refinements ofexisting transition labels; iii) beliefs regarding fault types are introduced to enhance thediagnosis results in (Sampath et al, 1995; Giua and Seatzu, 2005) (one way this can bedone is by introducing a weight for each enabled transition at any particular marking;when the weight represents the probability of an enabled transition at a marking, thedevelopment in this paper is similar to (Thorsley and Teneketzis, 2005) for diagnosingfaults in stochastic automata with the main difference being that the reachable statespace in the case of a Petri net is not necessarily finite).

The paper is organized as follows. Section 2 presents basic notation of Petri nets,labeled Petri nets, and partially observed Petri nets. In Section 3 we introduce the no-tion of belief and formulate the fault diagnosis problem. In Section 4, a transformationscheme from partially observed Petri nets to labeled Petri nets is proposed in orderto handle place sensors. In Section 5 we construct an online diagnoser to diagnosefaults and output beliefs regarding the occurrence of certain faults in labeled Petrinets. An example is provided in Section 6 to demonstrate the diagnoser construction.Conclusions can be found in Section 7.

2 Preliminaries

In this section, we review basic definitions of Petri nets (Murata, 1989; Cassandrasand Lafortune, 2008), labeled Petri nets (Peterson, 1981), and partially observed Petrinets (Ru and Hadjicostis, 2009b).

Definition 1 A Petri net structure N is a 4-tuple N = (P, T, F, W ) where P =p1, p2, ..., pn is a finite set of n places; T = t1, t2, ..., tm is a finite set of m transi-tions; F ⊆ (P × T ) ∪ (T × P ) is a set of arcs; W : F → N is a weight function whereN is the set of positive integers; P ∩ T = ∅ and P ∪ T '= ∅.

A marking is a function M : P → N ∪0 that assigns to each place a nonnegativeinteger number of tokens. Pictorially, places are represented by circles, transitions bybars, and tokens by black dots, as shown in Fig. 1. We denote by M(p) the number

1 Possible paths are sequences of transitions that are consistent with a given sequence ofobservations.

Page 4: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

4

Fig. 1 A partially observed Petri net Q with place and transition sensors.

of tokens in place p. A Petri net G = (N, M0) is a net structure N with an initialmarking M0. A Petri net is acyclic if it has no directed circuits. The set of all input(or output) places of a transition t ∈ T is defined as •t = p ∈ P | (p, t) ∈ F (ort• = p ∈ P | (t, p) ∈ F). A transition is a source transition if •t = ∅.

Given a Petri net, a transition t is enabled at marking M if ∀p ∈ •t, M(p) ≥ W (p, t);this is denoted by M [t〉. An enabled transition t may fire, and its firing removes W (p, t)tokens from each input place p and adds W (t, p′) tokens to each output place p′,resulting in a marking M ′; this is denoted by M [t〉M ′. In this paper, we assumethat at most one transition can fire at any instant. A k-length firing sequence Sfrom marking M is a sequence of transitions S = ts1ts2 · · · tsk , tsi ∈ T , such thatM [ts1〉M1[ts2〉M2 · · · [tsk 〉M ′; this is denoted by M [S〉M ′. The marking M ′ can alsobe written as

M ′ = M + Dσ ,

where (i) D is the n × m incidence matrix of N satisfying D(i, j) = −W (pi, tj) +W (tj , pi) (if W (pi, tj) or W (tj , pi) is not defined for a specific place pi and transitiontj , it is taken to be 0), and (ii) σ is the m × 1 firing vector of S with its ith entryrepresenting the number of times transition ti appears in S.

A marking M ′ is reachable from M if there exists a firing sequence S such thatM [S〉M ′. Given a Petri net, the set of all markings reachable from M0 is called thereachability set and is denoted by R(G, M0). If ∀p ∈ P and ∀M ∈ R(G, M0), M(p) ≤ Kfor some finite positive integer K, then we say the Petri net is K-bounded or simplybounded. A Petri net is said to be deadlock structurally bounded if there exists an n-dimensional column vector y with positive integer entries such that yT D < 0T

m, whereyT denotes the transpose of y, 0m denotes an m-dimensional column vector with allentries being 0, and the inequality is taken elementwise (Ru and Hadjicostis, 2009a).Acyclic Petri nets without source transitions can be shown to be deadlock structurallybounded (Ru and Hadjicostis, 2009a).

A labeled Petri net is defined as a 3-tuple (G, Σ, L), where G is a Petri net, Σ isthe set of labels (also called alphabet), and L : T → Σ ∪ ε is a labeling functionthat assigns a label (which can be the null label ε) to each transition (Peterson, 1981).A transition t is observable if L(t) ∈ Σ; To is used to denote the set of observabletransitions. Given a firing sequence S = ts1ts2 · · · tsk in a labeled Petri net, the corre-sponding observation sequence is ω = L(S) := L(ts1)L(ts2) · · ·L(tsk ), i.e., a string inΣ∗ (the set of all possible strings generated from the alphabet Σ). Given an observa-tion sequence ω generated by a labeled Petri net, there can be multiple firing sequencesthat can be mapped to ω as well as multiple system states that are consistent with theobservation.

Page 5: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

5

Definition 2 Given a labeled Petri net (G, Σ, L) with initial marking M0 and anobserved label sequence ω ∈ Σ∗, the set of consistent firing sequences is S(ω) =S|S ∈ T ∗To : M0[S〉 and L(S) = ω, and the set of consistent markings is C(ω) =M | ∃S ∈ S(ω) : M0[S〉M, where T ∗To denotes the concatenation of T ∗ and To.

Remark 1 The definition of consistent firing sequences requires that the last transitionmust be observable, and has the origin in the definition of Lo(G, x) in (Sampath et al,1995), where Lo(G, x) is the set of all traces that originate from state x and end at thefirst observable event. The definition of basis markings used for fault diagnosis in (Giuaand Seatzu, 2005) also requires that the last transition is observable. Note that there isno such requirement in the definitions of consistent markings in (Giua et al, 2007; Ruand Hadjicostis, 2009a). In other words, given the same observation sequence ω, theset of consistent markings as defined in this paper is a subset of the set of consistentmarkings defined in (Giua et al, 2007; Ru and Hadjicostis, 2009a). !

To handle transitions with the null label ε, we need to consider the Tε-inducedsubnet of a labeled Petri net (Giua et al, 2007).

Definition 3 Given a labeled Petri net (G, Σ, L) and Tε ⊆ T , we define the Tε-inducedsubnet as the net Nε = (Pε, Tε, Fε, Wε), where Pε = p ∈ P | ∃t ∈ Tε, p ∈ •t ∪ t•, Fε

is the restriction of F to (Pε × Tε) ∪ (Tε × Pε) and Wε is the restriction of W to Fε.Essentially, a labeled Petri net is a Petri net equipped with transition sensors. More

generally, a Petri net which can be equipped with both transition sensors and placesensors is called a partially observed Petri net (Ru and Hadjicostis, 2009b).

Definition 4 A partially observed Petri net Q is a 3-tuple (G, Po, To), where

– G = (N, M0) is a Petri net with n places and m transitions;– Po ⊆ P is the set of observable places with cardinality n1 satisfying 0 ≤ n1 ≤ n;– To ⊆ T is the set of observable transitions with cardinality m1 satisfying 0 ≤ m1 ≤

m.

An observable place p ∈ Po can have a sensor (e.g., a vision sensor) that indicatesits number of tokens; however, an unobservable place p ∈ Puo = P\Po cannot havesuch a sensor. One can always rename places to ensure that the first n1 places areobservable; therefore, we take Po = p1, p2, ..., pn1. A place sensor configuration V isa vector (v1 v2 ... vn1)

T , where vi = 0 if no sensor is on place pi and vi = 1 otherwise.||V || :=

Pn1i=1 vi ≤ n1 denotes the total number of place sensors in the place sensor

configuration V .An observable transition t ∈ To can have a sensor (e.g., a motion sensor) that

indicates when a transition within a given subset of transitions has fired; however, anunobservable transition t ∈ Tuo = T\To cannot have such a sensor associated withit. The association between sensors and transitions is captured by a labeling functionL : T → Σ ∪ ε that assigns a label to each transition. Unlike the fixed labelingfunction in labeled Petri nets, the labeling function L in a partially observed Petri netcan be reconfigured, subject to the constraint that unobservable transitions must beassigned the null label (i.e., L(t) = ε for all t ∈ Tuo).

If L(t1) = L(t2) = e ∈ Σ, the firings of t1 and t2 might not be distinguished solelyby the observed label e; if L(t) = ε, then the firing of transition t is not observed. Foreach e in Σ ∪ ε, we define Te = t ∈ T : L(t) = e. The transition (or transitions)in Te with |Te| = 1 (or |Te| ≥ 2) for e ∈ Σ is (or are) said to be deterministic (ornondeterministic).

Page 6: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

6

Example 1 The net in Fig. 1 is a partially observed Petri net Q with place and tran-sition sensors. Here Po = p1, p2, p3 (unobservable place p4 is drawn as a shadowedcircle) and the place sensor configuration is V = (0 1 0)T ; To = t1, t2, t3 (unob-servable transitions t4, t5 are drawn as shadowed bars) and the labeling function L isL(t1) = a, L(t2) = L(t3) = b, L(t4) = L(t5) = ε. !

3 Problem Formulation

In this section, we define the notion of belief and formulate the fault diagnosis problemin partially observed Petri nets.

3.1 Observation in Partially Observed Petri Nets

Observations in partially observed Petri nets with place sensor configuration V andlabeling function L are driven by token changes from place sensors and/or observabletransition labels. More specifically, suppose there is a state transition M [t〉M ′, thenthe observation from sensors is MV → L(t) → M ′

V if L(t) ∈ Σ, or L(t) = ε but MV isnot identical to M ′

V ; otherwise null. Here MV denotes the ||V ||× 1 dimensional vectorderived from M with entries being the number of tokens in places with sensors, →denotes the temporal order of observations. More formally, we define LV (M [t〉M ′) = εp

if MV = M ′V and L(t) = ε, and LV (M [t〉M ′) = MV → L(t) → M ′

V otherwise. Hereεp is similar to ε in labeled Petri nets and means the observation is null.

Given a partially observed Petri net Q with place sensor configuration V , labelingfunction L and a firing sequence S = ts1ts2 · · · tsk with a corresponding system trajec-tory M0[ts1〉M1[ts2〉M2 · · · [tsk 〉Mk, then the corresponding observation sequence is ofthe form

ωp = LV (S) := LV (M0[ts1〉M1)LV (M1[ts2〉M2) · · ·LV (Mk−1[tsk 〉Mk).

Here the concatenation of LV (·)’s is defined in the following way: (MV → e1 →M ′

V )(M ′V → e2 → M ′′

V ) = MV → e1 → M ′V → e2 → M ′′

V , (MV → e1 → M ′V )εp =

εp(MV → e1 → M ′V ) = MV → e1 → M ′

V .Similar to consistent firing sequences and consistent markings defined for labeled

Petri nets, we can also define these two concepts for partially observed Petri nets.

Definition 5 Given a partially observed Petri net Q with place sensor configurationV , labeling function L, and an observation sequence ωp, the set of consistent firingsequences is Sp(ωp) = S|S ∈ T ∗T ′o : M0[S〉 and LV (S) = ωp, and the set of consis-tent markings is Cp(ωp) = M | ∃S ∈ Sp(ωp) : M0[S〉M, where T ′o ⊆ T and ∀t ∈ T ′o,the firing of t will generate token changes of place sensors and/or observable transitionlabel.

In this paper, faults are modeled as unobservable transitions as in (Sampath et al,1995). We first use the following example to illustrate how we can diagnose faults inpartially observed Petri nets.

Example 2 (continued) In Fig. 1, let transition t5 be the only fault transition whichneeds to be detected. If the firing sequence is S = t2t5, then the system trajectory is

M0[t2〉M1[t5〉M2

Page 7: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

7

where M0 = [2 0 0 0]T , M1 = [1 1 0 0]T and M2 = [1 0 0 1]T ; the correspondingobservation from place sensors and transition sensors is

[0] → b → [1] → ε → [0].

Based on the observation, we can deduce that the fault transition t5 must have occurredbecause only the firing of t5 can decrease the number of tokens in p2 by 1 and at thesame time not generate any label. Note that the observation is driven by token changesin observable places with sensors and/or observed labels, and if the observed label is ε,it means that there is no observation output from transition sensors. If a place sensoris on p3 instead of p2 and the firing sequence is still t2t5, then the observation is0 → b → 0. It can be verified that S(0 → b → 0) = t2, t2t5. In this case, it is notpossible to determine whether fault transition t5 has occurred or not without furthersensing information. !

In the above example, the diagnosis output is ambiguous if the place sensor is onp3. If, however, we could have a measure of the likelihood of each firing sequence S thatis consistent with the observation ωp, then we could obtain a measure of how confidentwe are about the occurrence of faults. This is the focus of the next subsection.

3.2 Notion of Belief

As mentioned earlier, faults are modeled as unobservable transitions and are partitionedinto q types ∆F = F1, F2, · · · , Fq. Let TF be the set of fault transitions, and TFi

be the set of fault transitions whose type is Fi. Then we have (i) TF ⊆ Tuo; (ii)TF = TF1 ∪ TF2 ∪ · · · ∪ TFq

; (iii) TFi∩ TFj

= ∅ if i, j ∈ 1, 2, ..., q and i '= j.In previous approaches (e.g., (Sampath et al, 1995; Giua and Seatzu, 2005)), when

there are many execution paths that are consistent with a given observation sequence,the diagnosis result provides very coarse information. For example, in (Sampath et al,1995), there are three types of labels that can be associated with each system stateconsistent with the observation sequence: label ‘A’ is used when there is ambiguityabout the occurrence of certain faults; label ‘N’ is used when there is no fault in anyof the consistent paths reaching that state; label ‘Fi’ means that a fault belonging totype ‘Fi’ has occurred. Inspired by the notion of “belief” proposed in (Pearl, 1988)in probabilistic inference settings, we attempt to capture how confident we are aboutthe occurrence of faults of certain fault types based on the observations seen so far bydefining a suitable measure. This measure is called belief and is a way of capturingthe likelihood of different execution paths (e.g., via probabilities, power consumptionor other constraints).

To introduce the belief measure, we assume that for any firing sequence S enabledat M0, there exists a positive weight function wt(M0, S) which captures the likelihoodof the sequence S. We will discuss possible ways to systematically define and computewt(M0, S) in Section 5.

Definition 6 Given a partially observed Petri net Q with place sensor configurationV , labeling function L, partition ∆F = F1, F2, · · · , Fq of fault transitions TF , thebelief on the occurrence of faults belonging to type Fi after observing the sequenceωp = LV (S) due to an underlying unknown firing sequence S, is defined as

b(ωp, Fi) =

PS∈Sp(ωp) and ∃t∈TFi appearing in S wt(M0, S)

PS∈Sp(ωp) wt(M0, S)

;

Page 8: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

8

the belief on the normal running of the system is defined as

b(ωp, N) =

PS∈Sp(ωp) and no fault appears in S wt(M0, S)

PS∈Sp(ωp) wt(M0, S)

.

In other words, the belief b(ωp, Fi) is the ratio of the sum of path likelihoodsthat contain a fault belonging to type Fi over the sum of all path likelihoods thatare consistent with ωp. If b(ωp, Fi) = 0, then no fault of type Fi has occurred; ifb(ωp, Fi) = 1, then a fault of type Fi must have occurred; if 0 < b(ωp, Fi) < 1, thenthere is ambiguity about the occurrence of faults of type Fi. Therefore, the diagnosisresults in (Sampath et al, 1995) can also be obtained via the use of the belief measure asspecial cases; moreover, if b(ωp, Fi) is closer to 1 (or 0), we are more (or less) confidentabout the occurrence of a fault of type Fi.

3.3 Problem Formulation

Now we are in position to formulate the fault diagnosis problem. Given a partiallyobserved Petri net Q = (G, Po, To) with a place sensor configuration V , a labelingfunction L, partition ∆F = F1, F2, · · · , Fq of fault transitions TF , a weight functionwt(M0, S) defined for all firing sequences from M0, and an observation sequence

ωp = M0,V → e1 → M1,V → e2 → M2,V · · ·→ ek → Mk,V , (1)

i.e., ωp = LV (S) due to an underlying (unknown) firing sequence S, our goal is tocalculate the beliefs on the occurrence of each fault type, i.e., b(ωp, Fi) for 1 ≤ i ≤ q,and the belief on the normal running of the system, i.e., b(ωp, N).

The problem will be solved in two steps: i) we first transform a partially observedPetri net into an equivalent labeled Petri net as shown in Section 4, and ii) we propose amonitoring scheme for labeled Petri nets to facilitate calculating the beliefs recursivelyas shown in Section 5.

4 Equivalent Model

In this section, we consider how to handle information from place sensors and showthat the problem of fault diagnosis in partially observed Petri nets can be reduced tothe problem of fault diagnosis in labeled Petri nets. More specifically, given a partiallyobserved Petri net Q = (G, Po, To) with a place sensor configuration V , a labelingfunction L, and an observation sequence as described in Eq. (1), we show how toconstruct a labeled Petri net (G, Σ′, L′) and obtain a translator of the observationsequence ωp in Eq. (1) into a sequence of labels ω ∈ Σ′∗, such that Sp(ωp) = S(ω) andCp(ωp) = C(ω). We first use the following example to illustrate the main idea.

Example 3 (continued) In this example, we consider the Petri net in Fig. 1 with placesensor configuration V and labeling function L as defined in Example 1. To make theanalysis easier, we write down DV = D(p2, :) as follows

0 1 −1 0 −1a b ε

Page 9: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

9

where DV is the submatrix of D with rows corresponding to places with sensors in V .Now we will construct an equivalent labeled Petri net (G, Σ′, L′) by defining a newalphabet Σ′ and a new labeling function L′. The idea is to have alphabet Σ′ be arefinement of Σ, and have the labeling function L′ split labels of L into multiple labelsdepending on any additional information provided by place sensors. For instance, thefiring of transition t2 (or t3) generates label b and causes the token change 1 (or −1) inplace p2; since the external observer can use the token change information to resolvethe ambiguity due to label b, we can define new labels b1 and b2, and let L′(t2) = b1and L′(t3) = b2. Now we consider unobservable transitions t4 and t5. As the firing oftransition t4 does not generate any label or visible token changes, we let L′(t4) = ε,i.e., t4 is still unobservable even with the place sensor. Though the firing of transitiont5 does not generate any label, it does generate the token change −1 in place p2;therefore, we define a new label ε1 and let L′(t5) = ε1, which implies that transitiont5 becomes observable under L′. Finally, since the firing of transition t1 generates theuniquely associated label a, we can define2 a new label a1 and let L′(t1) = a1. Insummary, Σ′ = a1, b1, b2, ε1 with L′(t1) = a1, L′(t2) = b1, L′(t3) = b2, L′(t4) = ε,and L′(t5) = ε1. Note that we have implicitly defined the translator for a sequence ofobservation of the type in Eq. (1): for example, because [0] → b → [1] gets translated tob1, and [1] → ε → [0] gets translated to ε1, the observation [0] → b → [1] → ε → [0] getstranslated into b1ε1. In Example 2, transition t5 is assumed to be the fault transition.However, t5 becomes observable in the constructed labeled Petri net and is identifiableby the occurrence of the unique label ε1. !

Intuitively, with the addition of place sensors, some nondeterministic transitionsbecome deterministic (e.g., transitions t2, t3 in Example 3) and some unobservabletransitions become observable (e.g., transition t5 in Example 3). Therefore, we candefine a new labeling function L′ that (is a refinement of the original function L and)takes both the original labeling function L and the place sensor configuration V intoaccount.

To formalize the idea in Example 3, we first define partitions of Te (for e ∈ Σ ande = ε) as generated by the place sensor configuration V .

Definition 7 Given a partially observed Petri net Q with place sensor configurationV and labeling function L, the partition of Te generated by V for e ∈ Σ is defined tobe Ωe(V ) = S1, S2, ..., Sre, where

i) S1 ∪ S2 ∪ · · · ∪ Sre = Te;ii) Si ∩ Sj = ∅ if i '= j;iii) Si '= ∅ is a set with the maximal number of transitions tj , ..., tl that satisfy

tj , · · · , tl ∈ Te and DV (:, j) = · · · = DV (:, l).

Definition 8 Given a partially observed Petri net Q with place sensor configurationV and labeling function L, the partition of Tε generated by V is defined to be Ωε(V ) =S0, S1, ..., Srε, where

i) S0 ∪ S1 ∪ · · · ∪ Srε = Tε;ii) Si ∩ Sj = ∅ if i '= j;iii) S0 is a (possibly empty) set with the maximal number of transitions tj , ..., tl that

satisfy tj , · · · , tl ∈ Tε and DV (:, j) = · · · = DV (:, l) = 0||V ||×1;

2 To avoid confusion between Σ and Σ′, we define a new label a1 instead of using the originallabel a.

Page 10: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

10

iv) Si '= ∅ for i '= 0 is a set with the maximal number of transitions tj , ..., tl thatsatisfy tj , · · · , tl ∈ Te and DV (:, j) = · · · = DV (:, l) '= 0||V ||×1.

Note that re for e ∈ Σ is the number of distinct columns in DeV and rε is the

number of distinct nonzero columns in DεV , where De

V for e ∈ Σ∪ε is the submatrixof DV with columns corresponding to transitions mapped to the label e in L.

Example 4 (continued) For the net in Fig. 1, Ωa(V ) = t1, Ωb(V ) = t2, t3and Ωε(V ) = t4, t5. !

Now we are ready to propose the transformation from partially observed Petri netsto labeled Petri nets.

Algorithm 1 Transforming partially observed Petri nets to labeled Petri netsInput: a partially observed Petri net Q = (G, Po, To) with place sensor configurationV and labeling function LOutput: a labeled Petri net (G, Σ′, L′)

1. Initialize Σ′ to be ∅;2. Define L′ by iterating through all e ∈ Σ ∪ ε as follows:

– for each label e ∈ Σ (Te = S1∪S2∪ · · ·∪Sre): Σ′ = Σ′∪e1, e2, · · · , ere; if t ∈ Si

for some i in 1, 2, ..., re, then let L′(t) = ei;– for label ε (Tε = S0 ∪ S1 ∪ · · · ∪ Srε): Σ′ = Σ′ ∪ ε1, ε2, · · · , εrε; if t ∈ S0, then

L′(t) = ε and if t ∈ Si for some i in 1, 2, ..., rε, then L′(t) = εi;

3. Output (G, Σ′, L′).

Next, we need consider how to translate the observation sequence ωp as in Eq.(1)to a sequence of labels in the constructed labeled Petri net. We use x → e → y todenote an observation unit if there exists a state M1 and a transition t such thatM1[t〉M2, x = M1,V , y = M2,V , and L(t) = e, where e can be the null label ε. Givenan observation unit x → e → y, the most important information is the label e andthe (visible) marking variation ∆ = y − x. Now we construct a sensing informationmapping table which maps (e, ∆) to e′ ∈ Σ′. There are two cases:

– If e is not null, then find the index i so that there exists a transition t ∈ Te =S1 ∪ · · · ∪ Sre such that t ∈ Si and DV (:, t) = ∆. Then e′ is ei;

– If e is null, then find the index i so that there exists a transition t ∈ Tε\S0 =S1 ∪ · · ·∪ Srε such that t ∈ Si and DV (:, t) = ∆ (note that in this case ∆ must benonzero). Then e′ is εi.

It can be verified that the sensing information mapping is a one-to-one mappingfrom the set of (e, ∆)’s in the partially observed Petri nets to transition labels Σ′ in theconstructed labeled Petri net because the transition labels are constructed according to(e, ∆) as shown in Algorithm 1. Based on the mapping table, the following algorithmwill translate a sequence of observations from place sensors and transition labels intoa sequence of labels.

Algorithm 2 Sensing information translatorInput: a partially observed Petri net Q with place sensor configuration V , labelingfunction L and a sequence of sensing information ωp of the form in Eq. (1)Output: a k length sequence of labels ω from Σ′ of the transformed labeled Petri net(G, Σ′, L′)

Page 11: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

11

Table 1 Sensing information mapping

(e, ∆) (a, 0) (b, 1) (b,−1) (ε,−1)e′ a1 b1 b2 ε1

1. Construct the labeled Petri net (G, Σ′, L′) using Algorithm 1;2. Construct the sensing information mapping table;3. Set ω = ε. If ωp is null, exit;4. For each observation unit x → e → y of the sequence in Eq. (1):

– Compute ∆ = y − x;– Search the table for (e, ∆) and find the corresponding label e′;– Set ω to be ωe′;

5. Output ω.

Example 5 (continued) Now we revisit Example 3. Table 1 shows the mapping from(e, ∆) to e′. Using Algorithm 2, the observation [0] → b → [1] → ε → [0] is trans-lated into the observation sequence b1ε1 which is the same as the one we obtained inExample 3 using a more intuitive approach. !

Proposition 1 Given a partially observed Petri net Q with place sensor configurationV , labeling function L and a sequence of sensing information ωp of the form in Eq. (1),Sp(ωp) (namely, the set of firing sequences consistent with ωp in Q) is identical toS(ω), the set of firing sequences consistent with ω (which is generated by Algorithm 2)in the transformed labeled Petri net (G, Σ′, L′). Similarly, Cp(ωp) is identical to C(ω)in the labeled Petri net (G, Σ′, L′).

Proof: The results are proved by induction on the observation units. The base case:if ωp is null, then Sp(ωp) = ∅ based on Definition 5. Also, ωp being null implies thatthere are neither token changes from place sensors nor transition labels from transitionsensors; therefore, the output from Algorithm 2 is ω = ε. Accordingly, S(ω) = ∅based on Definition 2. This establishes the base case. The induction step: supposeωp = M0,V → e1 → M1,V → e2 → M2,V · · · → ei → Mi,V is a sequence with iobservation units and gets translated into ω = e′1e′2 · · · e′i, and Sp(ωp) = S(ω). If wehave an extra observation unit Mi,V → ei+1 → Mi+1,V and the unit gets translatedinto e′i+1 by Algorithm 2, we want to show Sp(ωp(Mi,V → ei+1 → Mi+1,V )) =S(ωe′i+1). In the partially observed Petri net Q, suppose a firing sequence SS′t isin Sp(ωp(Mi,V → ei+1 → Mi+1,V )), where S ∈ Sp(ωp), S′ ∈ (T − T ′o)∗, and thefiring of t generates Mi,V → ei+1 → Mi+1,V . Then the firing of t in the labeled Petrinet (G, Σ′, L′) will generate the corresponding unique label e′i+1 from the sensinginformation mapping table. Because3 S ∈ Sp(ωp) = S(ω), S′ ∈ (T − T ′o)∗ = T ∗ε ,and the firing of t generates e′i+1, SS′t must be in S(ωe′i+1); in other words, we haveshown that Sp(ωp(Mi,V → ei+1 → Mi+1,V )) ⊆ S(ωe′i+1). S(ωe′i+1) ⊆ Sp(ωp(Mi,V →ei+1 → Mi+1,V )) can be proved similarly because the sensing information mapping isa one-to-one mapping. This completes the proof for the equivalence of consistent firingsequences. The equivalence of consistent markings can be proved in a similar manner.!

3 T − T ′o is the set of transitions that will generate neither token changes nor observable

transition label; therefore, such transitions will be mapped to ε in the constructed labeledPetri net. In other words, T − T ′

o = Tε, where Tε is obtained from L′ in (G, Σ′, L′).

Page 12: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

12

Since no observation information is lost through the transformation, we can focuson computing the beliefs in the transformed labeled Petri net. This is considered inthe next section.

5 Online Monitor

In this section, we first revisit the belief notion and reformulate the diagnosis problemusing the constructed labeled Petri net, and then propose an online monitor to calculatethe beliefs.

5.1 Belief Revisited

Based on the transformation in Section 4, the partially observed Petri net Q becomes alabeled Petri net (G, Σ′, L′) and the observation ωp as in Eq. (1) becomes a sequence oflabels ω ∈ Σ′∗ of length k. Therefore, the beliefs can be equivalently calculated in theconstructed labeled Petri net (for clarity, we use (G, Σ, L) to represent the constructednet instead of (G, Σ′, L′)) as follows: given the constructed labeled Petri net (G, Σ, L)with the translated sequence of labels ω and partition ∆F = F1, F2, · · · , Fq of faulttransitions TF , the belief on the occurrence of faults belonging to type Fi is

b(ωp, Fi) = b(ω, Fi) =

PS∈S(ω) and ∃t∈TFi appearing in S wt(M0, S)

PS∈S(ω) wt(M0, S)

; (2)

the belief on the normal running of the system is

b(ωp, N) = b(ω, N) =

PS∈S(ω) and no fault appears in S wt(M0, S)

PS∈S(ω) wt(M0, S)

. (3)

The diagnosis problem considered in Section 3 becomes equivalent to calculating be-liefs in the constructed labeled Petri net; the only change is that in the fault model,fault transitions are not necessarily mapped to ε because fault transitions can becomeobservable (e.g., transition t5 in Example 3).

To systematically define the weight function wt(M0, S) from each individual tran-sition in S and calculate the belief efficiently in a recursive online manner, we assumethe existence of a weight function wt(M, t) : R(G, M0)×TM → R+

0 , where TM denotesthe set of transitions that are enabled at the marking M and R+

0 denotes the set ofnonnegative real numbers. Such a weight function can describe the likelihood of tran-sition t at marking M (e.g., it can capture the probability that a particular transitionoccurs at a particular state (Benveniste et al, 2003a)), or it can correspond to thecost of transition t’s firing at marking M (which is a generalization of the cost functionin (Li et al, 2006)). We can then define the weight function for a sequence of transitionsS = ts1ts2 · · · tsk that is enabled at marking M0 (i.e., M0[ts1〉M1[ts2〉M2 · · · [tsk 〉Mk)as an extension of wt(M, t):

wt(M0, S) = wt(M0, ts1)O

wt(M1, ts2) · · ·O

wt(Mk−1, tsk ),

whereN

is some associative abstract operation. The exact operation ofN

is derivedfrom the meaning of the wt(M, t). For example,

Ncan be ×, the ordinary product of

Page 13: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

13

Fig. 2 Illustration of beliefs.

real numbers, if wt(M, t) captures the probability of the occurrence of t at M ; andN

can be +, the ordinary sum of real numbers, if wt(M, t) captures the cost of firing t atM ; other choices are also possible. To simplify the discussion, we use ×, the ordinaryproduct of real numbers, from now on.

Two interesting choices for the weight function wt(M, t) are the following. (i) Thefirst choice is wt(M, t) = 1 for any M and any t that is enabled at M . With thischoice, wt(M0, S) = 1 for any firing sequence S. Therefore, b(ω, Fi) is the ratio of thenumber of paths that contain a fault belonging to type Fi over the number of all pathsthat are consistent with ω. (ii) The second choice is wt(M, t) = 1

|TM | , which allows

each transition enabled at M to occur with equal weight 1|TM | . This latter weight

function gives more weight to paths that have smaller branching. Other reasonableweight functions are also possible (e.g., weight functions might be defined based onprobabilistic Petri nets by randomizing choices (Benveniste et al, 2003a)).

Example 6 In Fig. 2, the observation is simply the label a; markings are denoted usingblack dots (markings in the lowest level are consistent markings), observable (or un-observable) transitions are denoted using solid (dashed) lines; there is only one faulttransition tf of type F . If wt(M, t) = 1, then b(a, F ) = 1

5 ; however, if wt(M, t) = 1|TM | ,

then b(a, F ) = 13 . !

5.2 Online Monitor

In this subsection we describe cases when we can efficiently calculate beliefs in a re-cursive manner. The resulting diagnoser can be used online to determine the beliefsassociated with each fault type given the observation sequence seen so far.

We now construct an online monitor to recursively obtain the beliefs on the occur-rences of fault transitions as observations are made. We make the following reasonableassumption to ensure that there is no arbitrary long sequence of unobservable tran-sitions; this is also a requirement in (Sampath et al, 1995) and (implicitly) in (Giuaand Seatzu, 2005) as Giua et al. assume that the Tε-induced subnet is acyclic. Thisassumption also guarantees that the online monitor we introduce in this section canbe constructed efficiently.

Assumption 1 The Tε-induced subnet of the given labeled Petri net is deadlock struc-turally bounded.

The other assumption below is on the weight function wt(M0, S). Note that the ×in this assumption can also be an arbitrary associative

Nas mentioned in Section 5.1.

Page 14: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

14

Assumption 2 The path weight function wt(M0, S) for a path M0[ts1〉M1 · · · [tsk 〉Mk

satisfies wt(M0, S) = wt(M0, ts1)× wt(M1, ts2) · · ·× wt(Mk−1, tsk ).

Eq. (2) can be rewritten as

b(ω, Fi) =

PM∈C(ω)

“PS∈S(ω), M0[S>M, and ∃t∈TFi appearing in S wt(M0, S)

PM∈C(ω)

“PS∈S(ω) and M0[S>M wt(M0, S)

by grouping the sum of wt(M0, S) for sequences that lead to the same consistent mark-ing M . Note that

PS∈S(ω), M0[S>M, and ∃t∈TFi appearing in S wt(M0, S) is completely

determined by the fault type Fi and the marking M , andP

S∈S(ω) and M0[S>M wt(M0, S)is completely determined by the marking M . Therefore, given an observation sequenceω, we we can use the following data structure as a node to represent each consistentmarking M ∈ C(ω) and to also include information on fault occurrences: (M, K), where(i) M is a consistent marking in C(ω); (ii) K is a (q + 2)-dimensional row vector inwhich

K(i) =X

S∈S(ω), M0[S>M, and ∃t∈TFi appearing in S

wt(M0, S) for i = 1, . . . , q,

i.e., the weighted sum of consistent paths that drive the system from M0 to M andalso contain faults belonging to type Fi;

K(q + 1) =X

S∈S(ω) and M0[S>M

wt(M0, S),

i.e., the weighted sum of all consistent paths that drive the system from M0 to M ;

K(q + 2) =X

S∈S(ω),M0[S>M, and no fault appears in S

wt(M0, S),

i.e., the weighted sum of consistent paths that drive the system from M0 to M withoutfaults. The K(q + 2) entry takes Eq. (3) into account.

We define Cext(ω) = ∪M∈C(ω)(M, K). Using the data structure (M, K) in Cext(ω),the beliefs can be computed using the following equations:

b(ω, Fi) =

P(M,K)∈Cext(ω) K(i)

P(M,K)∈Cext(ω) K(q + 1)

, i = 1, ..., q,

b(ω, N) =

P(M,K)∈Cext(ω) K(q + 2)

P(M,K)∈Cext(ω) K(q + 1)

. (4)

Note that Cext(ω) are essentially consistent markings plus belief information. We cancompute consistent markings recursively (Ru and Hadjicostis, 2009a) and we now showthat we can also compute Cext(ω) recursively, which implies that beliefs can be com-puted recursively.

If ω = ε, there is only one consistent marking M0. Therefore, Cext(ε) is initializedas (M0, K0), where K0 = [01×q 1 1]. Now suppose we have computed Cext(ω) andobserve a new label e, we need calculate Cext(ωe). A typical plot of the process isshown in Fig. 3. In this figure, Cext(ω) = (M1, K1), (M2, K2), (M3, K4), (M4, K4),Cext(ωe) = (M9, K9), (M10, K10), dashed lines denote unobservable transitions, solidlines denote observable transitions that generate e, tf1 and tf2 are fault transitions,

Page 15: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

15

Fig. 3 Calculation of Cext(ωe) = (M9, K9), (M10, K10) from Cext(ω) =(M1, K1), (M2, K2), (M3, K3), (M4, K4).

and other nodes are intermediate nodes. There cannot be cycles in the update processbecause of Assumption 1.

We divide the computation of Cext(ωe) from Cext(ω) into two steps: in the first step,we ignore the observed label e and obtain all markings reachable by firing only unob-servable transition sequences (e.g., the graph consisting of dashed edges and relatednodes in Fig. 3); in the second step, we take the label e into account to get Cext(ωe)(e.g., the graph consisting of solid edges and related nodes in Fig. 3).

In the first step, we use Cuo to denote the set of nodes (M, K) such that (M, K) isreachable from some node in Cext(ω) by firing only unobservable transitions. Therefore,Cext(ω) ⊆ Cuo. When updating beliefs given the current node (M, K) ∈ Cuo and oneenabled (unobservable) transition t, the state M ′ in the next node (M ′, K′) can beobtained as M ′ = M + D(:, t). Depending on whether t is a fault transition and onwhether M ′ has appeared in some node computed in the current step, there are fourcases to consider:

– Case I: t is a normal transition and M ′ does not exist in nodes computed atthe current step (e.g., transition t1 is enabled at M1 and the resulting markingM5 has not been computed, as shown in Fig. 3). In this case, create a new node(M ′, K′) (e.g., (M5, K5)), in which K′(i) = K(i) × wt(M, t) for i = 1, . . . , q + 2.The reason this works is because, if Si denotes the set of firing sequences that drivethe system from M0 to M and satisfy corresponding properties for different i’s,then K′(i) =

PS∈Si

wt(M0, St) =P

S∈Si(wt(M0, S) × wt(M, t)) = wt(M, t) ×P

S∈Siwt(M0, S) = wt(M, t)×K(i) for i = 1, 2, ..., q + 2;

– Case II: t is a fault transition belonging to type Fi and M ′ does not exist in nodescomputed at the current step (e.g., fault transition tf1 is enabled at M5 and theresulting marking M6 has not been computed, as shown in Fig. 3). In this case,create a new node (M ′, K′) (e.g., (M6, K6)), in which K′(j) = K(j)×wt(M, t) forj ∈ 1, . . . , q + 1− i, K′(i) = K(q + 1)× wt(M, t) and K′(q + 2) = 0;

– Case III: t is a normal transition and M ′ exists in node (M ′, K′) computed at thecurrent step (e.g., transition t2 is enabled at M7 and the resulting marking M6 wascomputed from (M5, K5) by firing transition tf2, as shown in Fig. 3). In this case,let K′(i) = K′(i) + K(i)× wt(M, t) for i = 1, . . . , q + 2;

– Case IV: t is a fault transition belonging to type Fi and M ′ exists in node (M ′, K′)computed at the current step (e.g., fault transition tf2 is enabled at M3 and theresulting marking M7 was computed from (M2, K2) by firing transition t3, as shownin Fig. 3). In this case, let K′(j) = K′(j)+K(j)×wt(M, t) for j ∈ 1, . . . , q+1−i,K′(i) = K′(i)+K(q+1)×wt(M, t). Note that in this case K′(q+2) has no change.

The argument for Case II is similar to Case I except that t is a fault transition of typeFi. In Case IV (Case III is similar to Case IV and we do not discuss it explicitly here),(M7, K7) has already been created based on the firing of t3 from (M2, K2). As the firing

Page 16: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

16

of fault transition tf2 from (M3, K3) also results in the marking M7, we need to updateK7. The value of K7(q + 2) does not change because it denotes the weighted sum ofpaths without faults; for j ∈ 1, . . . , q +1−i, K7(j) = K7(j)+K3(j)×wt(M3, tf2)while K7(i) = K7(i) + K3(q + 1) × wt(M3, tf2) because all paths from the initialmarking to M7 going through M3 contain the fault transition tf2 of type Fi.

Note that there is a dependency issue in this update process: more specifically,suppose we calculate (M, K) in Fig. 3 by considering all firing sequences consistingof unobservable transitions from (M1, K1), (M2, K2), (M3, K3), (M4, K4) sequentially.After we are done with (M1, K1) and (M2, K2), we consider (M3, K3) and update(M7, K7). However, (M6, K6) also depends on (M7, K7) and therefore, we also needto update all markings that are reachable from M7. To avoid this dependency issue,we can first generate the reachability graph in Fig. 3 and then update the value K innode (M, K) only when the K′ in (M ′, K′) (where M is reachable from M ′) is finalizedwithout further update. Though the computation of Cuo amounts to reachability anal-ysis in general, Assumption 1 ensures that |Cuo| is finite (Ru and Hadjicostis, 2009a)and that the reachability graph is acyclic.

In the second step, we take the label e into account to get Cext(ωe). What we needto do is just to consider all transitions that are mapped to e at all markings in Cuo. Theupdate process is essentially the same as the four cases discussed in the calculation ofCuo (because fault transitions can also be mapped to the label e).

The online monitor is given below.

Algorithm 3 Online monitor

1. ω0 = ε, Cext(ω0) = (M0, K0).2. Let i = 0.

3. Wait until a new event e is observed.

4. Let i = i + 1, ωi = ωi−1e, Cext(ωi) = ∅.5. Calculate Cuo following the rules in Cases I-IV

6. For all (M, K) ∈ Cuo

For all t such that L(t) = e and M [t〉Compute M ′ = M + D(:, t): (i) if M ′ does not appear in any node of Cext(ωi),

calculate K′ using the rules in Cases I-II and add (M ′, K′) into Cext(ωi); (ii)

if M ′ exists in node (M ′, K′) of Cext(ωi), update K′ using the rules in Cases III-

IV.

7. Output the belief b(ωi, Fi) for i = 1, . . . , q and b(ωi, N) using Eq. (4).

8. Goto 3.

We now briefly explain the algorithm. Steps 1 and 2 initialize Cext(ω); Steps 3-8update Cext(ω) as an extra label e is observed. In Step 5, we compute all nodes in Cuo

that can be reached from consistent nodes in Cext(ωi−1) by firing sequences consistingof unobservable transitions, while in Step 6, we update Cuo to compute Cext(ωi) byconsidering all transitions that can be mapped to the most recently observed label e.This is consistent with the definition of the set of consistent firing sequences in thatthe last transition is required to be observable.

The above algorithm is based on consistent markings that can be reached viadifferent firing sequences. The complexity of Algorithm 3 is polynomial in the length kof the observed label sequence as we argue next. First, we recall the following slightlydifferent definition of consistent markings (Giua et al, 2005).

Page 17: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

17

Definition 9 Given a labeled Petri net (G, Σ, L) with initial marking M0 and anobserved label sequence ω ∈ Σ∗, the set of consistent markings is C′(ω) = M | ∃S ∈T ∗ : M0[S〉M and L(S) = ω.

Here we use C′(ω) to distinguish it from Definition 2. Essentially, in this definitionthe last transition in the firing sequence S such that M0[S〉M and L(S) = ω need notbe observable. Therefore, |C(ω)| ≤ |C′(ω)|. In (Ru and Hadjicostis, 2009a), we showthat |C′(ω)| is O(kb) for a given labeled Petri net with a deadlock structurally boundedTε-induced subnet, where k is the length of ω, b = j(l − 1) + lε, j is the number oflabels that are associated with more than one transitions, l is the maximal number oftransitions that can be associated with the same label, lε is the number of transitionsassociated with ε. Thus, |C(ω)| is O(kb).

Suppose we have computed Cext(ωk−1). In the first step of computing Cext(ωk),we first compute Cuo. As |Cuo| ≤ |C′(ωk−1), |Cuo| is O((k − 1)b), which implies Cuo

can be calculated with complexity polynomial in k. In the second step, if a new evente is observed after the observation sequence ωk−1, then we need to (i) consider everytransition associated with e for every marking in Cuo, (ii) obtain the next markingM if it is enabled, (iii) compare the new marking M with other consistent markingscomputed at stage k and calculate/update the corresponding K, and (iv) add (M, K)to Cext(ωk) if it is not already included or update K if M is already included. Roughly,the complexity is |Cuo|× |Te|× (n + n + nNk + q + 2) in terms of scalar comparisonsand additions, where the first n is the number of comparisons to determine whethersome transition associated with e is enabled, the second n is the number of additionsto compute the next marking, nNk is the number of comparisons to check whether thenext marking has already been added into Cext(ωk) (Nk := |Cext(ωk)|), and q+2 is thenumber of additions to update K. Using the previous bound, it is easy to verify thatthe complexity of the second step is O(l(k − 1)b(nkb + q)). Clearly, the computationof Cext(ωk) from Cext(ωk−1) is polynomial in k, which implies that the complexity ofcomputing Cext(ωk) starting from (M0, K0) using Algorithm 3 is also polynomial in k.

The overall architecture of state estimation and fault diagnosis for partially ob-served Petri nets is shown in Fig. 4.

5.3 Extensions

In this section, we discuss extensions of the monitor scheme to handle repeated faultsand multiple faults.

Fig. 4 State estimation and fault diagnosis in DESs.

Page 18: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

18

It is possible that in some systems the same type of fault repeats a multiple numberof times (e.g., intermittent or non-persistent faults may occur repeatedly) (Jiang et al,2002). There can also be multiple faults. In (Hadjicostis and Verghese, 1999), faults aremodeled as token corruptions in places; in that setting, tokens in multiple places canbe corrupted because of multiple faults. In Petri nets allowing concurrency, multiplefaults can occur simultaneously when faults are modeled as unobservable transitions.However, in this paper, at most one transition can fire at any instant and there is onlyone (unknown) underlying firing sequence. Therefore, repeated (or multiple) faults aredefined as faults of same (or different) types occurring in the same firing sequence.

For example, we consider two types of faults TF1 and TF2 . Given an observationsequence ω, we say faults in TF1 and TF2 possibly occur if there is a firing sequenceS ∈ S(ω) such that there exist transitions t1 ∈ TF1 and t2 ∈ TF2 and t1, t2 appearin S. To calculate b(ω, F1, F2), i.e., the belief on the occurrence of faults of types F1

and F2, we need expand the K vector with an additional entry K(q + 3) to track thesum of the weights of firing sequences containing faults of these two types. Moreover,we need to remember if a fault of type F1 (or F2) has occurred in some sequence sothat we can update the entry K(q + 3) when we find another fault of type F2 (or F1)later on. Except for the need to remember the occurrence of faults of a single type, therules for the update are essentially the same as those in Cases I-IV. Similarly, we cancalculate beliefs on the occurrence of repeated faults and faults of other multiple faulttypes.

6 Example

In this section, we use a simple communication protocol to illustrate the monitor.The partially observed Petri net model of the protocol is shown in Fig. 5(a) and itis adapted from the net in Fig. 1 in (Giua and Seatzu, 2005) by adding transitiont8 and related arcs (so that the system is deadlock-free). In this model, L(t1) = a,L(t7) = L(t8) = b, L(t2) = L(t3) = L(t4) = L(t5) = L(t6) = ε, and transition t6 is afault transition of type F . Places p6 and p7 are unobservable and all other places areobservable. Assume the place sensor configuration V is (0 0 0 1 0)T , i.e., only placep4 has a sensor. Suppose the weight function satisfies wt(M, t) = 1 for all reachablemarkings M and for all transitions t enabled at M .

The net models a communication protocol: messages ready to be sent are dividedinto two packets (transition t1) to be sent over two separate channels (places p6 andp7). The two packets are finally combined and an acknowledgement is sent to thesender (transition t7). A fault occurs when a packet that should be traveling on thefirst channel is erroneously moved to the second channel (transition t6). In this case,a time-out signal is sent to the sender (transition t8) and the message is sent again.Here we do not distinguish between the acknowledgement or time-out signal, thoughwe do assume that they are observable.

Using the construction in Section 4, the equivalent labeled Petri net model is shownin Fig. 5(b). The new alphabet is Σ′ = a1, b1, b2, ε1 and the new labeling function isL′(t1) = a1, L′(t7) = b1, L′(t8) = b2, L′(t4) = ε1, L′(t2) = L′(t3) = L′(t5) = L′(t6) =ε. Notice that unobservable transition t4 becomes observable with label ε1 but faulttransition t6 is still unobservable. Using the vector y = (3 4 1 2 3)T , the Tε-inducedsubnet in Fig. 5(b) is easily verified to be deadlock structurally bounded. This means

Page 19: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

19

(a) Partially observed Petri net

(b) Equivalent labeled Petri net

Fig. 5 Petri net models of a communication protocol.

Fig. 6 Structure of nodes corresponding to the observation of the sequence of labels a1ε1ε1.

that we can apply the techniques in Section 5 to compute the beliefs on fault transitiont6 with complexity that is polynomial in the length of the observation sequence.

Now let us assume that the observation from the net in Fig. 5(a) is given by0 → a → 0 → ε → 1 → ε → 2, which is translated to a label sequence ω = a1ε1ε1

for the labeled Petri net in Fig. 5(b). The structure of the nodes corresponding toω = a1ε1ε1 is shown in Fig. 6 (the nodes in the figure are given in Table 2). Afterobserving the label a1, the marking in N1 is the only consistent marking and thecorresponding K is (0 1 1). Therefore, the belief b(a1, F ) = 0; in other words, we aresure that no fault has occurred after observing a1. If we next observe label ε1, then themarkings in nodes N9, N10, N11, N12, N13 are all consistent and the correspondingK’s are respectively (1 1 0), (0 2 2), (0 1 1), (0 3 3), (3 3 0). Therefore, the beliefb(a1ε1, F ) = 0.4. However, after observing the next label ε1, the marking in node N14

is the only one that is consistent with the corresponding K being (7 7 0). At this point,b(a1ε1ε1, F ) = 1, i.e., after observing a1ε1ε1, we are sure that a fault has occurred.This example shows how beliefs can be potentially refined as more and more labelsare observed. Note that nodes other than N0, N1, N9, N10, N11, N12, N13 and N14

are intermediate nodes that are reached via the firing of one or more unobservabletransitions starting from nodes in Cext(ε), Cext(a1), Cext(a1ε1), and Cext(a1ε1ε1).

Page 20: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

20

Table 2 Nodes in Fig. 6

Node M KN0 (1 0 0 0 0 0 0)T (0 1 1)N1 (0 1 1 0 0 0 0)T (0 1 1)N2 (0 1 0 0 0 0 1)T (0 1 1)N3 (0 0 1 0 0 1 0)T (0 1 1)N4 (0 1 0 0 1 0 0)T (0 1 1)N5 (0 1 0 0 0 1 0)T (1 1 0)N6 (0 0 0 0 0 1 1)T (0 2 2)N7 (0 0 0 0 1 1 0)T (0 3 3)N8 (0 0 0 0 0 2 0)T (3 3 0)N9 (0 1 0 1 0 0 0)T (1 1 0)N10 (0 0 0 1 0 0 1)T (0 2 2)N11 (0 0 1 1 0 0 0)T (0 1 1)N12 (0 0 0 1 1 0 0)T (0 3 3)N13 (0 0 0 1 0 1 0)T (3 3 0)N14 (0 0 0 2 0 0 0)T (7 7 0)

7 Conclusion

In this paper, we studied fault diagnosis in DES modeled by partially observed Petrinets, i.e., Petri nets equipped with sensors that allow observation of the number oftokens in some of the places and/or observation of the firing of some of the transitions.A transformation scheme was proposed to transform a partially observed Petri netinto an equivalent labeled Petri net and an online monitor was constructed to diagnosefaults and provide beliefs regarding the occurrences of faults of certain types. Repeatedfaults and multiple faults are also discussed as an extension of the monitor scheme.

In the future, we plan to consider belief calculation in labeled Petri nets withmore general types of Tε-induced subnets; we also plan to consider ways to define anddetermine the diagnosability of general labeled Petri nets.

Acknowledgements

The authors would like to thank anonymous reviewers for very helpful comments onan earlier conference paper that included some of these results.

References

Aghasaryan A, Fabre E, Benveniste A, Boubour R, Jard C (1998) Fault detection anddiagnosis in distributed systems: An approach by partially stochastic Petri nets.Discrete Event Dynamic Systems: Theory and Applications 8:203–231

Benveniste A, Fabre E, Haar S (2003a) Markov nets: Probabilistic models for dis-tributed and concurrent systems. IEEE Transactions on Automatic Control 48:1936–1950

Benveniste A, Fabre E, Haar S, Jard C (2003b) Diagnosis of asynchronous discrete-event systems: A net unfolding approach. IEEE Transactions on Automatic Control48:714–727

Page 21: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

21

Cassandras CG, Lafortune S (2008) Introduction to Discrete Event Systems (2nd Edi-tion). Springer, New York, USA

Chung SL (2005) Diagnosing PN-based models with partial observable transitions. IntJ Computer Integrated Manufacturing 18:158–169

Genc S, Lafortune S (2007) Distributed diagnosis of place-bordered Petri nets. IEEETransactions on Automation Science and Engineering 4:206–219

Giua A, Seatzu C (2005) Fault detection for discrete event systems using Petri netswith unobservable transitions. In: 44th IEEE Conf. on Decision and Control, Seville,Spain, pp 6323–6328

Giua A, Corona D, Seatzu C (2005) State estimation of λ-free labeled Petri nets withcontact-free nondeterministic transitions. Discrete Event Dynamic Systems: Theoryand Applications 15:85–108

Giua A, Seatzu C, Corona C (2007) Marking estimation of Petri nets with silent tran-sitions. IEEE Transactions on Automatic Control 52:1695–1699

Hadjicostis CN, Verghese GC (1999) Monitoring discrete event systems using Petri netembeddings. In: Application and Theory of Petri Nets 1999 (Series Lecture Notes inComputer Science, vol. 1639), pp 188–207

Jiang S, Kumar R, Garcia HE (2002) Diagnosis of repeated failures in discrete eventsystems. In: 41st IEEE Conf. on Decision and Control, Las Vegas, USA, pp 4000–4005

Lefebvre D, Delherm C (2007) Diagnosis of DES with Petri net models. IEEE Trans-actions on Automation Science and Engineering 4:114–118

Li L, Ru Y, Hadjicostis CN (2006) Least-cost firing sequence estimation in labeledPetri nets. In: Proc. of 45th IEEE Conf. on Decision and Control, San Diego, USA,pp 416–421

Murata T (1989) Petri nets: Properties, analysis and applications. Proceedings of theIEEE 77:541–580

Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of PlausibleInference. Morgan Kaufmann

Peterson JL (1981) Petri Net Theory and the Modelling of Systems. Prentice-Hall,New Jersey, USA

Ramadge PJ, Wonham WM (1989) The control of discrete event systems. Proceedingsof the IEEE 77:81–98

Ramırez-Trevino A, Ruiz-Beltran E, Rivera-Rangel I, Lopez-Mellado E (2007) Onlinefault diagnosis of discrete event systems. a Petri net-based approach. IEEE Trans-actions on Automation Science and Engineering 4:31–39

Ru Y, Hadjicostis CN (2009a) Bounds on the number of markings consistent with labelobservations in Petri nets. To appear in IEEE Transactions on Automation Scienceand Engineering

Ru Y, Hadjicostis CN (2009b) Sensor selection for structural observability in discreteevent systems modeled by Petri nets. Submitted to IEEE Trans on Automatic Con-trol

Sampath M, Sengupta R, Lafortune S, Sinnamohideen K, Teneketzis D (1995) Di-agnosability of discrete event systems. IEEE Transactions on Automatic Control40:1555–1575

Thorsley D, Teneketzis D (2005) Diagnosability of stochastic discrete-event systems.IEEE Transactions on Automatic Control 50:476–492

Ushio T, Onishi I, Okuda K (1998) Fault detection based on Petri net models withfaulty behaviors. In: Proc. of IEEE Int. Conf. on Systems, Man, and Cybernetics,

Page 22: Fault Diagnosis in Discrete Event Systems Modeled by ... · puter systems, and communication networks. As DESs become more complicated and widespread, failures appear more often and

22

San Diego, USA, pp 113–118Wu Y, Hadjicostis CN (2005) Algebraic approaches for fault identification in discrete-

event systems. IEEE Transactions on Automatic Control 50:2048–2053