[ieee 2012 ieee statistical signal processing workshop (ssp) - ann arbor, mi, usa...

JOINT STATE AND PARAMETER ESTIMATION FOR BOOLEAN DYNAMICAL SYSTEMS

Ulisses Braga-Neto

Department of Electrical and Computer EngineeringTexas A&M University

College Station, Texas 77843E-mail: [email protected]

ABSTRACT

In a recent publication, a novel state-space signal model wasproposed for discrete-time Boolean dynamical systems. Theoptimal recursive MMSE estimator for this model is called theBoolean Kalman filter (BKF), and an efficient algorithm waspresented for its exact computation. In the present paper, weconsider the system identification problem, i.e., the problemof parameter estimation for the case where only incompleteknowledge about the system is available. To solve this prob-lem, we propose the application of the BKF in the context ofthe well-known paradigm of joint estimation of state and pa-rameters. The approach is illustrated via a network inferenceexample.

Index Terms— Boolean Dynamical Systems, OptimalState Estimation, System Identification, Boolean NetworkInference.

1. INTRODUCTION

In modern scientific applications, such as Genomic Sig-nal Processing and Digital Communications, the need oftenarises for models and optimal estimation methods for dynam-ical systems of switching bistable components, i.e., Booleanswitches, the relationship among which is governed by net-works of logical gates updated and observed through noise atdiscrete time intervals.

In a previous publication, a signal model for Booleandynamic systems was introduced [1], which consists of aBoolean state process observed through noise. It was shownthat the model includes as special cases other well-knownmodels in the literature, such as the Boolean Network (BN),Boolean Network with perturbation (BNp), and Probabilis-tic Boolean Network (PBN) models. The optimal recursiveMMSE estimator for the proposed model was called theBoolean Kalman Filter (BKF) and an algorithm for its exactcomputation was provided.

The BKF is directly applicable only if full knowledgeabout the model is available. Here we relax that assumptionby assuming that this information is only partially available.The resulting problem of system identification is addressed

via application of the BKF in the context of joint state andparameter estimation [2, 3], by introducing a parameter pro-cess and auxiliary Boolean signal models for joint and dualestimation.

This paper is organized as follows. Section 2 reviews theBoolean signal model and its MMSE estimator, the BooleanKalman Filter. Section 3 discusses the main ideas related tothe application of the BKF in the joint state and parameterestimation framework. Section 4 presents results from a net-work inference example. Finally, Section 5 provides a sum-mary and issues for future investigation.

2. BOOLEAN KALMAN FILTER

Assume that the system is described by a state process{Xk;k =0,1, . . .}, where Xk ∈ {0,1}d is a Boolean vector of size d.The state is observed indirectly through the observation pro-cess {Yk;k = 1,2, . . .}, where Yk is a general vector ofany number of continuous or discrete measurements. TheBoolean Kalman Filter (BKF) is the recursive minimummean-square error (MMSE) state estimator for the signalmodel specified by:

Xk = f (Xk−1) ⊕ nk (state model)

Yk = h (Xk,vk) (observation model)(1)

for k = 1,2, . . .. Here, “⊕” indicates component-wisemodulo-2 addition, f ∶ {0,1}d → {0,1}d is an arbitrary net-work function, which expresses a logical relationship betweenthe state vectors at consecutive time points, h ∶ {0,1}d → Ois a general function mapping the current state into the ob-servation space O, whereas {nk,vk;k = 1,2, . . .} are whitenoise processes, with nk ∈ {0,1}d and vk ∈ O. The noiseprocesses are “white” in the sense that the noises at distincttime points are independent random variables. It is also as-sumed that the noise processes are independent of each otherand independent of the state process.

The optimal filtering problem consists of finding an esti-mator Xk = h(Y1, . . . ,Yk) of the state Xk that optimizes agiven performance criterion among all possible functions of

2012 IEEE Statistical Signal Processing Workshop (SSP)

978-1-4673-0183-1/12/$31.00 ©2012 IEEE 704

Y1, . . . ,Yk. The two criteria considered here are the condi-tional mean-square error (MSE):

MSE(Y1, . . . ,Yk) = E [∣∣Xk −Xk ∣∣2∣ Yk, . . . ,Y1] (2)

and the (unconditional) mean-square error

MSE = E [∣∣Xk −Xk ∣∣2] = E [MSE(Y1, . . . ,Yk) ] . (3)

The BKF provides the MMSE state estimator, accordingto both criteria above, and may be computed exactly in a re-cursive fashion, as shown in [1]. Briefly, let (x1, . . . ,x2d)

be an arbitrary enumeration of the possible state vectors. Foreach time k = 1,2, . . . define the posterior distribution vectors(PDV) Πk∣k and Πk∣k−1 of length 2d by means of

Πk∣k(i) = P (Xk = xi∣ Yk, . . . ,Y1) , (4)

Πk∣k−1(i) = P (Xk = xi∣ Yk−1, . . . ,Y1) , (5)

for i = 1, . . . ,2d. Let the prediction matrix Mk of size2d × 2d be the transition matrix of the Markov chain de-fined by the state model: (Mk)ij = P (Xk = xi ∣ Xk−1 =

xj) = P (nk = xi ⊕ f(xj)), for i, j = 1, . . . ,2d. Addi-tionally, given a value of the observation vector y, let theupdate matrix Tk(y), also of size 2d × 2d, be a diago-nal matrix defined by the observation model: (Tk(y))jj =

pvk(h(Xk,vk) = y ∣ Xk = xj), for j = 1, . . . ,2d. Finally,

define the matrix A of size d × 2d via A = [x1⋯x2d].The following result, which appears in [1], gives a proce-

dure to compute the MMSE state estimator.

Theorem 1. (Boolean Kalman Filter.) The optimal mini-mum MSE estimator Xk of the state Xk given the observa-tions Y1, . . . ,Yk up to time k, according to either criterion(2) or (3), is given by

Xk = E [Xk ∣ Yk, . . . ,Y1] , (6)

where v(i) = Iv(i)>1/2 for i = 1, . . . , d. This estimator and itsoptimal conditional MSE can be computed by the followingprocedure.

1. Initialization Step: The initial PDV is given by Π0∣0(i) =

P (X0 = xi), for i = 1, . . . ,2d.

For k ≥ 1 = 1,2, . . ., do:

2. Prediction Step: Given the previous PDV Πk−1∣k−1,the predicted PDV Πk∣k−1 is given by Πk∣k−1 =

Mk Πk−1∣k−1.

3. Update Step: Given the current observation Yk =

yk, let βk = Tk(yk)Πk∣k−1. The updated PDV Πk∣k

is obtained by normalizing βk to obtain a probabilitymeasure: Πk∣k = βk∥∣βk ∣∣1.

4. MMSE Estimator Computation Step: The MMSE esti-mator is given by

Xk = AΠk∣k (7)

with optimal conditional MSE

MSE(Y1, . . . ,Yk) = ∣∣min{AΠk∣k, (AΠk∣k)c}∣∣1 ,

(8)where the minimum is applied component-wise, andXk

c(i) = 1 −Xk(i), for i = 1, . . . , d.

3. JOINT STATE AND PARAMETER ESTIMATION

Operation of the BKF requires the system network functionf in (1) to be known. In practice, this may not be the case.We address the case of unknown, or partially known, networkfunction by means of a parametric approach. The networkfunction is one in a family of possible networks, being in-dexed by a parameter w. The signal model can be rewrittenas:

Xk = f (Xk−1,w) ⊕ nk (state model)


Notice that the true parameter w is fixed. In particular, it is nota random variable. It is also assumed here that w is constantand does not change with time. The parameter represents thepart of the model that is unknown and must be estimated fromthe noisy observations, along with the state. Here we proposeto code this information into a Boolean parameter w ∈ {0,1}l,so that the BKF can be applied to its estimation.

3.1. Parameter Process

In order to employ the BKF in the estimation of w, we willintroduce a random process {Wk;k = 0,1, . . .}, where Wk ∈

{0,1}l, the same space that contains the true parameter w.Following an approach used in the linear case [2], we modelthe parameter process as a random walk:

Wk = Wk−1 ⊕ rk (parameter model) (10)

where {rk;k = 1,2, . . .} is a white noise process, with rk ∈

{0,1}l.In the next two subsections, we discuss two approaches to

estimating the unknown parameter w, which are both basedon the idea of making the conditional statistics of Wk givenobservations Y1, . . . ,Yk up to time k become close to w ask increases, so that an estimator Wk of Wk given the obser-vations also serves as an estimate of w. The marginal distri-bution P(Wk) at each time k may be interpreted as “informa-tion” about w. However, it can be shown, under certain minorconditions on the noise rk, that P (Wk) quickly converges tothe uniform distribution. In particular, any prior informationcontained in the initial distribution P (W0) is quickly dissi-pated.

705

3.2. Joint BKF Approach

A natural idea for the estimation of Wk is to combine it withthe state vector Xk, obtaining a new state vector [Xk,Wk]

T ,and perform joint estimation of the corresponding combinedprocess [2, 3]. This is done by modifying the signal model in(9) to replace w by Wk, and adding equation (10) to obtain

[Xk

Wk] = [

f(Xk−1,Wk)

Wk−1] ⊕ [

nk

rk] , (state model)


The BKF is applied to this new Boolean signal model to ob-tain the MMSE estimator [Xk,Wk]

T , which provides the es-timated network function fk = f(⋅,Wk). This approach iscalled the Joint BKF approach to network inference. Noticethat the signal model (11) is obviously not the same as theoriginal one in (9). The question of convergence of Wk to w,and fk to f , is therefore an important one. This problem wasinvestigated in the linear case in [4].

3.3. Dual BKF Approach

An alternative approach is to have two separate “state” esti-mators running in parallel, where the first estimates the state,while the second estimates the parameters, with one feedingestimates for use in the other [2]. In the present case, this isaccomplished by writing two separate Boolean signal models:

“State” Signal Model:

Xk = f (Xk−1,Wk−1) ⊕ nk (state model)


“Parameter” Signal Model:

Wk = Wk−1 ⊕ rk (“state” model)

Xk = f (Xk−1,Wk) ⊕ nk (“observation” model)(13)

The BKFs applied to these signal models are called the stateBKF and the parameter BKF, respectively. The state BKFis fed the parameter estimate from the previous time pointby the parameter BKF, and computes a new state estimate,which it feeds back to the parameter BKF as an “observation”on the parameter process. The parameter BKF then updatesits estimate, and the cycle restarts. Notice that, even if nk istime-invariant, the transition matrix for the state BKF is time-variant, being a function of the estimate Wk−1, and so are theupdate matrices for the parameter BKF, being a function ofXk and Xk−1. One advantage of the dual approach is that the“state” vectors in each filter are smaller than the combinedstate and parameter vector of the joint approach, which cansignificantly reduce the computational burden.

4. NUMERICAL EXAMPLE

Here we illustrate the joint BKF approach of Section 3.2 bymeans of a simple example. The Boolean dynamical systemconsists of four variables observed through additive binarynoise and is given by the following equations:

Xk(1) = Xk(1) AND Xk(3) ⊕ nk(1)

Xk(2) = Xk(2) NOR Xk(4) ⊕ nk(2)

Xk(3) = Xk(1) OR Xk(4) ⊕ nk(3)

Xk(4) = Xk(1) XOR Xk(3) ⊕ nk(4)

Yk = Xk ⊕ vk

(14)

where the noise vectors are assumed independent, withnk(i) ∼ Bernoulli(p) and vk(i) ∼ Bernoulli(q), for i =

1, . . . ,4, where 0 < p, q < 0.5. The noise parameter p givesthe amount of “perturbation” to the Boolean state process;the closer it is to p = 0.5, the more chaotic the system willbe, while a value of p close to zero means that the state tra-jectories are nearly deterministic, being governed tightly bythe logic gates. From the perspective of network inference(i.e., parameter estimation), a value of p close to 0.5 makesidentification of the system hard due to occurrence of toomany “false” transitions; however, a value of p too close tozero means that the system will spend more time locked intoattractors, reducing the diversity of state transitions “seen,”which also makes the inference problem hard. In this exam-ple, we set p = 0.05. On the other hand, q give the intensity ofthe observation noise, being related to its variance q(1 − q).The noise is maximal at q = 0.5, while a value of q close tozero means that the observations are tightly correlated withthe clean state signal. In contrast to the perturbation noise,performance of inference here is monotone with q, beingalways better the smaller q is. In this example, we considertwo values: q = 0.01 and q = 0.1.

To set up the inference example, we will assume that thewiring of the network (i.e., the pair of predictors for each tar-get) and the logic gates for the first two targets are known, butthe identity of the last two logic gates are unknown. Sincethere are 24 = 16 possible logic gates with two inputs, the pa-rameter vector w consists of 8 bits, and is illustrated below:

The parameter process vector Wk and noise vector rk aretherefore also 8-bit words. The noise rk is assumed inde-pendent, with rk(i) ∼ Bernoulli(r), for i = 1, . . . ,8, where0 < r < 0.5. The larger r is, the faster Wk will move in itsrandom walk through parameter space. The value of r has a

706

direct impact on the convergence properties of the estimationmethods. In this example, we set r = 0.01.

The simulation is carried out by running the signal modelin (9) with the true value of the parameter w in order to ob-tain observations Y1, . . . ,Y100. These are fed sequentially tothe joint BKF, with a combined state and parameter vector ofsize 12. Figure 1(a) and (b) display the results for the casesq = 0.01 and q = 0.1, respectively. Displayed are both the in-ferred logics (top two panels) and corresponding conditionalMSEs (bottom two panels). We can observe that convergenceto the correct logics is very fast under less noise (q = 0.01),yielding small MSE values, but that convergence is compro-mised somewhat with more noisy observations (q = 0.1), withlarger MSE values.

5. CONCLUSION

This paper proposed the application of the BKF in the contextof joint estimation of state and parameters, for the case whereonly incomplete information about the system is available. Asimple example was given to demonstrate the methodologyand illustrate some of the issues involved.

Future work will investigate issues such as the compar-ison in performance between the joint and the dual estima-tion methods in network inference, as well as the compar-ison with other methods, such as maximum-likelihood ap-proaches. What the impact of the various noise parameters areon inference performance needs to be investigated in more de-tail. The problem of system identifiability, which appears inthe inference of the network wiring, will be addressed in a fu-ture publication. Finally, when there is little prior knowledge,the corresponding large dimensionality of the parameter vec-tor may make the exact computation of the BKF prohibitive,necessitating the use of approximate, Monte-Carlo sequentialmethods, such as the Particle Filter [5].

6. REFERENCES

[1] U. Braga-Neto, “Optimal state estimation for boolean dy-namical systems,” 2011. Proceedings of 45th AnnualAsilomar Conference on Signals, Systems, and Comput-ers, Pacific Grove, CA.

[2] A. Nelson, Nonlinear Estimation and Modeling of NoisyTime-Series by Dual Kalman Filtering Methods. PhD the-sis, Oregon Health and Science University, Portland, Ore-gon, 2000.

[3] J. Candy, Bayesian Signal Processing: Classical, Mod-ern, and Particle Filtering Methods. New York, NY: Wi-ley, 2009.

[4] L. Ljung, “Asymptotic behavior of the extended kalmanfilter as a parameter estimator for linear systems,” IEEE

(a)

(b)

Fig. 1. Joint BKF results for simple example. (a) clean obser-vations (q = 0.01), (b) noisy observations (q = 0.1). Displayedare both the inferred logics (top panels) and correspondingconditional MSEs (bottom panels).

Transactions on Automatic Control, vol. AC-24, no. 1,pp. 36–50, 1979.

[5] D. Simon, Optimal State Estimation: Kalman, H Infin-ity, and Nonlinear Approaches. New York, NY: Wiley-Interscience, 2006.

707

[ieee 2012 ieee statistical signal processing workshop (ssp) - ann arbor, mi, usa...

Documents