overview of emerging bayesian biao huang …bhuang/research/nonlinearid...volterra models [boyd and...

OVERVIEW OF EMERGING BAYESIANAPPROACH TO NONLINEAR SYSTEM

IDENTIFICATION

Biao Huang ∗,1 Qing-Guo Wang ∗∗

∗Department of Chemical and MaterialsEngineering,University of Alberta, Edmonton AB T6G

2G6, Canada∗∗Dept. of Electrical and Computer Engineering, National

University of Singapore, Singapore 119260

Abstract: Over the last twenty years, nonlinear system identification has gainedsignificant interest due to the increasing demand of high performance control.Nonlinear system identification is challenging owing to its complexity, unpre-dictability, and dimension curse. The literature has mainly focused on two classesof modeling. One is the most studied black-box modeling that is completely databased. Another important class of system identification is grey-box modeling,i.e. certain model structure of the underlying system is known, but parametersand state variables are unknown. The general Bayesian approach to nonlinearsystem identification targets not only the black-box problem but also the grey-box problem. The Bayesian inference idea is certainly not new but its usefulnesshas not been proven in nonlinear system identification until recent years owing tothe increasing capacity of computation power. This paper gives an overview of itsmost significant development in recent years, namely the unscented Kalman filter(UKF) for sequential inference of parameters/state. The UKF can be used for on-line as well as off-line applications. The general solution to the sequential Bayesianinference problem is first reviewed, and the paper will then focus on the unscentedKalman filter as a representative of the emerging Bayesian filtering approach tosystem identification. We endeavor to explain mathematical concepts in a simplelanguage and provide a tutorial on some key results. Several engineering examplesare presented to illustrate the new developments and their utilities.

Keywords: Nonlinear system identification, Kalman filter, Unscented Kalmanfilter, Bayesian Statistics, black-box modeling, grey-box modeling

1. INTRODUCTION

System identification is one of the most studiedsubjects in the systems and control discipline.However, most of the work has focused on linearsystem identification where elegant mathemati-

1 Corresponding author; Tel: +1(780)492-9016, Fax:+1(780)492-2881, E-mail: [email protected]

cal derivations and theories exist [Ljung, 1999;Soderstrom and Stoica, 1989]. Nonlinear systemidentification is important particularly in processindustry since almost all physical systems havecertain degree of nonlinearity, ranging from valvenonlinearity to highly nonlinear chemical reac-tions. Accurate description of underlying nonlin-

earity is required for high performance control andmonitoring of many industrial processes.

Over the last twenty years, nonlinear systemidentification has gained significant progress. Nu-merous publications have appeared in the lit-erature, ranging from relatively simple bilinearsystem identification, Hammerstein and Wienermodel identification, to chaotic system identifica-tion. According to the bibliography prepared byGeorgios B. Giannakis and Erchin Serpedin, thereare 1410 publications in nonlinear system identi-fication by year 2000. Nonlinear system identifi-cation is challenging owing to its complexity, un-predictability and dimension curse. Attention hastherefore been focused on black-box methods. Butthe black-box approach has certain obvious limi-tations. One limitation is the limited validity andthe models identified from black-box approach cannot be extrapolated in general. It is difficult to in-terpret the black-box models and their parametersdo not have direct physical interpretations.

However, as pointed out by Pearson and Ogun-naike (1997), the design of a process control sys-tem requires one to match the real-world processdynamics with control system design methodolo-gies. The model developed must be compatiblein structure and complexity with the require-ment of the control system design methodologies.To be precise, model identification is an exer-cise of model reduction [Huang and Shah, 1997].The mismatch due to the approximation can becompensated to some extend by the feedbackcontrol. Following this line, numerous nonlinearmodel structures have been proposed. In [Pear-son and Ogunnaike, 1997], nonlinear model struc-tures are broadly classified into two categories,continuous-time model and discrete-time model.Among continuous time models, the noticeablemodels are control-affine models [Kantor, 1987],Volterra models [Boyd and Chua, 1985], Ham-merstein models [Greblicki and Pawlak, 1989],and Wiener models [Greblicki, 1992]. Among dis-crete time models, the noticeable models are well-known nonlinear autoregressive moving averagemodels with exogenous inputs (NARMAX) thathas the linear ARMAX origin, nonlinear autore-gressive models with exogenous inputs (NARX),nonlinear additive autoregressive models with ex-ogenous inputs (NAARX), and Volterra models.

It is, however, not practical to solve the gen-eral nonlinear model identification problem with-out imposing some working constraints on thestructure. To build tractable identification al-gorithms, Pearson and Ogunnaike (1997) sug-gested identification algorithms to be limited tofour special cases of the model structure, namelypolynomial NARMAX models, NAARX models,Volterra models, and block-oriented models.

Other than the black box modeling, there is whitebox modeling. In [Lindskog, 1996], the white boxmodels are defined as the models that reflect allproperties of the true system. They are solelyconstructed from prior knowledge and physicalprinciples without any use of measurements fromthe system. However, this type of modeling isnot realistic in practice. If one is pursuing whitebox modeling but tune some of the physical con-stants, then one is also doing system identifi-cation [Lindskog, 1996], and this is also knownas physically parameterized modeling. Therefore,in a broad definition of system identification, allidentification methods can be categorized to be ona scale ranging from pure black box to physicallyparameterized modeling. However, physically pa-rameterized modeling requires strong expertise inthe discipline of the physical system of interest.The model structure is usually overcomplex, andit is difficult to achieve without relying on certainapproximations.

Another important class of system identificationis the grey-box approach, i.e. certain model struc-ture of the underlying system is derived accord-ing to first principles, and the other parts maybe an approximation. This apriori knowledge isimportant and should be included particularly innonlinear system identification if a reliable non-linear dynamic model is desired. The grey-boxapproach is similar to traditional parameter es-timation problem. However, compared to tradi-tional nonlinear parameter estimation where allvariables are available, the grey-box nonlinear dy-namic system identification is more challenging inthe sense that the states of the system are alsounknown. The general approach to treating theproblem of estimating parameterized models fromtime series amounts in a state space description[Sitz et al., 2002]. But even for linear state spacemodel, estimation of model parameters becomesa nonlinear estimation problem owing to bothunknown states and unknown parameters.

For both black-box and grey-box modeling, theemerging Bayesian approach has become increas-ingly attractive in recent years owing to the in-creasing capacity of computation power. Amongmany recent developments, the most significantones are unscented Kalman filter (UKF) [Julierand Uhlmann, 2004; Norgaard et al., 2000; Wanand van der Merwe, 2001; Wan et al., 2000] andparticle filter [Doucet et al., 2001; Gordon et al.,1993]. They can be used not only for off-linebut also for on-line parameter/state estimation.Successful applications of the UKF have beenreported even for the chaotic model identifica-tion [Sitz et al., 2002]. Many conventional systemidentification methods such as subspace identi-fication may also be classified under the sameframework. The Bayesian approach, putting it in

simple words, is to estimate a probability dis-tribution/density function (pdf) of the parame-ter/state estimate by using all available measure-ments. Expressed as p(Parameter|Evidence), theoperator p or Prob. indicates uncertainty in de-termining the parameter with the available evi-dence. With more evidence becoming available,the uncertainty generally get less unless wrongevidence is used otherwise. Based on this, theBayesian approach is often used for on-line esti-mation that utilizes all evidence up to the cur-rent time, and the estimation progressively pro-ceeds with new evidence being available. Once thisp(Parameter|Evidence) is calculated, a specificestimate of the parameter can be calculated ac-cording to certain optimization criterion such asmean estimate (minimum mean squares error es-timate) or maximum a posterior (MAP) estimate.

The optimal Bayesian solution is general, providesfull probability density function, and can copewith multi-modality, asymmetries, and disconti-nuities [Julier and Uhlmann, 2004]. This is incontrast to classical estimation that has generallyassumed Gaussian distribution and linear in pa-rameters. However, not all pdfs can be describedby a finite number of parameters (cf Gaussiandistribution needs only two parameters, mean andvariance). Therefore, practical solutions of thegeneral Bayesian inference must rely on the ap-proximations of the pdf, and high computationload is generally expected.

A variety of pdf approximations have been pro-posed in the literature. Unfortunately most ofthem are either computationally unmanageableor require special assumptions about the form ofthe process and observation models that can notbe satisfied in practice. The Kalman filter thatrelies on mean and variance to approximate pdfis still the most popular approach. The extendedKalman filter (EKF) is probably the most widelyused estimation algorithm for nonlinear systems.As stated in [Julier and Uhlmann, 2004], morethan 35 years of experience in the estimation com-munity has shown that it is difficult to implement,difficult to tune, and only reliable for systems thatare almost linear on the time scale of the updates.In a series of papers by Julier and coworkers, it isshown that the regular EKF is only accurate up tothe first order in estimating mean and covariance,and yet it requires computation of the Jacobian.As an improvement, the second-order EKF hasbeen proposed but demands extensive implemen-tation efforts due to the need for Hessian matrices.

Another intuitive way of filtering is to estimatethe pdf directly by means of a representative setof samples. Applying the state space equationsto these samples yields new prediction samples,from which statistics, such as mean and covari-

ance, can be estimated and used for the Kalmanfilter update equations [Sitz et al., 2002]. Alongthis line, Julier and Uhlmann (2004) developeda novel extension of the Kalman filter for non-linear systems, known as the unscented Kalmanfilter (UKF). This procedure belongs to the classof statistical linearization schemes in which pdfsare truncated [Sitz et al., 2002]. Higher ordermoments of the density function are neglected,i.e., mean and covariance are being used only.A sample set with same mean and covariance isgenerated and propagated through the full statespace model. Compared with EKF, UKF deals thenonlinearity in a simpler yet more elegant wayin the sense that a better quality of estimates isachieved with no increased computation load thanEKF. It has also been shown that the UKF isaccurate up to second order for the estimation ofmean and covariance [Julier and Uhlmann, 2004].Without relying on the derivative for Jacobian,UKF is capable of dealing discontinuity. Due tothe recursive structure of the UKF, the probabil-ity of stopping in a local minimum of the costfunction is greatly reduced, and it can also applyto unevenly sampled data.

A more comprehensive approach to approximat-ing pdfs is through particle filtering [Doucet et al.,2001; Gordon et al., 1993]. Particles are the ran-dom samples of pdfs. A particle filter is essentiallya sequential Monte Carlo simulation method. Theoverall goal is to directly implement optimal re-cursive Bayesian estimation by recursively ap-proximating the complete pdfs [van der Merwe,2004]. It approximates the posterior by a set ofweighted samples without making any explicitassumption about its form and can thus be usedin general nonlinear, non-Gaussian systems. De-tailed review and discussions of the particle filterfor engineering applications can be found in [Chenet al., 2004].

In EKF, it has been well known that parameterestimation problem can be formulated as stateestimation problem [Nelson, 2000]. The same ap-proach has been carried over to UKF [van derMerwe, 2004; Wan and van der Merwe, 2000;Gustafsson and Hriljac, 2003; van der Merwe,2004]. The aim of this paper is to review and tointerpret recent developments of UKF and its ap-plication for parameter estimation, and to exploreits solutions for engineering problems. For this ob-jective, the paper first discusses the formulation ofparameter estimation problem under state estima-tion framework. We will then introduce a generalsolution to sequential Bayesian inference problem.After reviewing the Bayesian interpretation of theKalman filter and extended Kalman filter, we willfocus on the unscented Kalman filter approach forparameter estimation. Throughout the paper, weendeavor to explain mathematical concepts in a

simple language. Several examples are provided toillustrate the applications of the UKF for nonlin-ear system identification of engineering problem.

2. STATE SPACE FORMULATION FORNONLINEAR SYSTEM IDENTIFICATION

We begin our discussion by considering a nonlin-ear function without dynamics

yt = g(ut, θ) + et

where ut, yt, and θ are the input, output, andunknown parameter, respectively. Given a set ofdata ut and yt, the general task of parameter esti-mation is to estimate θ to minimize the followingresidual

εt = yt − g(ut, θ) (1)in certain optimal fashion.

This problem may be formulated as a state esti-mation problem as

θt+1 = θt

yt = g(ut, θ) + et

If parameter θ is also time varying, then theproblem may be formulated as

θt+1 = θt + wt

yt = g(ut, θ) + et

where the distribution of wt determines how muchthe parameters vary with time. The distributionof wt, on the other hand, can also be used as atuning parameter for the parameter estimation.For example, the choice of a small variance of wt

makes faster tracking of the parameter. Clearly,if we are able to estimate complete pdf of theparameter, an optimal estimation that minimizesthe residual expressed by eqn.(1), in consistencewith conventional parameter estimation criteria,can be obtained.

Now consider a nonlinear dynamic system repre-sented by a nonlinear state-space model [Ljung,1999]:

zt+1 = f(zt, ut, θt) + vt (2)

yt = h(zt, ut, θt) + et (3)

where z ∈ Rn, θt ∈ Rd, u ∈ Rq, and y ∈ Rm arethe state, unknown parameter, input, and output,respectively.

The state and parameter estimation problem canbe formulated as:

[zt+1

θt+1

]=

[f(zt, ut, θt)

θt

]+

[vt

wt

](4)

yt = h(zt, ut, θt) + et (5)

Let

xt =[

zt

θt

]

Then the estimation of parameter and state canbe formulated as the estimation of xt from thefollowing nonlinear model with additive noise:

xt+1 = f(xt, ut) + vt

yt = h(xt, ut) + et

Removing the assumption of additivity of thenoise, a more general formulation is obtained

xt+1 = f(xt, ut, vt)

yt = h(xt, ut, et)

An alternative formulation was proposed by [Gustafs-son and Hriljac, 2003], where the augmented statespace model is written as

[zt+1

θt+1

]=

[f(zt, θt)

θt

]+

[vz

t + wzt

vθt + wθ

t

]

yt = h(zt, θt) + et

where the noises are physical state noise, vzt and

the noise that represents parameter uncertainty,vθ

t . wzt is the state roughening noise and wθ

t pa-rameter roughening noise. The roughening noiseis a second level tuning parameter. For example,in system identification of time invariant systemwhere vθ

t = 0, then wθt should be chosen to have

variance decaying to zero to yield converging pa-rameter estimation.

The parameter estimation problem is to estimatethe state xt using all information up to time t, i.e.to estimate xt using yt, yt−1, . . . , y0.

3. SEQUENTIAL BAYESIAN INFERENCE

Consider the nonlinear state-space model:

xt+1 = f(xt, vt) (6)

yt = h(xt, et) (7)

where, as stated in the preceding section, x ∈Rn is the state that can also include unknownparameters, and y ∈ Rm is the output. We do notinclude exogenous input term ut in the equationfor simplicity of the presentation since ut is alsoan observation in the estimation problem, playingthe same role as yt from estimation point of view.This simplification does not mean it is not neededin the experiment. From experiment design pointof view, it is certainly an important variable to beconsidered explicitly.

Denote Yt = {y0, . . . , yt}. The state estimation,in general, is formulated as finding the conditional

tttttt dxYxpxx )|(| ∫= )|(max)( | tttt YxpxMAP =

)|( tt Yxp )|( tt Yxp

Fig. 1. Illustrative point estimate examples

distribution function p(xt|Yt), i.e. the distributionof the state given the observation up to currenttime. Once this conditional distribution functionis known, the state estimate (point estimate) canbe calculated as

xt = Θ(p(xt|Yt))

where Θ is a nonlinear function operator. Forclarity, we denote xt by xt|t in the sequel, meaningan estimate of xt given all observations up to timet. With p(xt|Yt) given, one can pick up a pointestimate of xt according to certain optimizing cri-terion. A mean (or minimum mean square error)estimate of xt can be written as

xt|t =∫

xtp(xt|Yt)dxt

or the Maximum a Posterior estimate (MAP) is

MAP (xt|t) = argMax{p(xt|Yt)}The mean estimate and MAP estimate are illus-trated in Fig.1.

Since the distribution is known, the variance ofthe estimate can also be estimated. This is animportant advantage of estimation of the entirepdf rather than a point estimate only.

The general solution of state estimation is givenby recursively solving the following equations[Bergman, 1999]:

p(xt|Yt−1) =∫

p(xt|xt−1)p(xt−1|Yt−1)dxt (8)

p(xt|Yt) =p(yt|xt)p(xt|Yt−1)

p(yt|Yt−1)(9)

Since p(xt|xt−1) is a function of xt−1, (8) can alsobe written as

p(xt|Yt−1) = Ext−1|Yt−1p(xt|xt−1)

i.e. it is a conditional average of the pdf of thepredicted state xt given all past observations upto t− 1. The pdf p(xt|xt−1) depends on the statefunction f(xt−1, vt−1); therefore p(xt|Yt−1) canbe evaluated from f(xt−1, vt−1) and pdfs of xt−1

and vt−1. This is known as prediction of pdf andthe procedure is illustrated in Fig.2, where onecan observe that a Gaussian pdf can be distortedowing to the nonlinearity.

Since p(yt|Yt−1) is a constant given all observa-tions up to time t, (9) can also be written as

p(xt|Yt) = αp(yt|xt)p(xt|Yt−1) (10)

)|( 11 −− tt Yxp

),( 11 −− tt vxf

)|( 1−tt Yxp

Fig. 2. pdf prediction

)|( 1−tt Yxp

×)|( tt Yxp)|( 1−tt xyp

Fig. 3. pdf update

where p(yt|xt) is a likelihood function of yt. Thisis known as update of the pdf by the new availableobservation yt. This procedure is illustrated inFig.3.

To summarize, the sequential Bayesian inferenceconsists of two steps:

(I) Prediction step: Predict the pdf, p(xt|Yt−1),according to the state function f(xt−1, vt−1)via expection operation.

(II) Updating step: Update p(xt|Yt−1) to p(xt|Yt)through the multiplication of the likelihoodp(yt|xt).

This sequential procedure of Bayesian inference,starting from time 0, is illustrated in Fig.4

)|( 01 Yxp

)|( 11 Yxp)|( 00 Yxp

),( 00 vxf)|( 11 xypα )|( 11 xypα

1y1y

)|( 22 Yxp …

)|( 12 Yxp

),( 11 vxf )|( 22 xypα

2y

Fig. 4. Sequential Bayesian inferencing

If considering p(xt|Yt−1) as priori (i.e. beforeupdate of the most recent observation yt), andp(xt|Yt) as posterior (i.e. after the update), theneqn.(10) implies

posterior ∝ Likelihood× priori (11)

This rule may be applied to the filtering of anynonlinear system with any distribution. The pri-ori calculation is so called prediction, and theposterior is filtering in traditional Kalman filterterminology.

As an example of applying the general Bayesianinference rule, consider a model

y = θ + e (12)

where

p(θ) =

0.6 for θ = −10.2 for θ = 00.2 for θ = 1

(13)

and

p(e) ={

0.2 for e = −10.8 for e = 1 (14)

An observation y = 0 is obtained. The question isto find the Bayesian estimate of θ.

According to the general rule, eqn.(11), we aresolving

p(θ|y = 0) = αp(y = 0|θ)p(θ)

In view of the combinatorial senarios out ofeqn.(12) and (14), it is easy to see the followingcalculation of the likelihood p(y = 0|θ):

p(y = 0|θ = −1) = 0.8

i.e if y = 0, θ = −1, e must be 1

p(y = 0|θ = 0) = 0

i.e the combination of y = 0, θ = 0 is not possible

p(y = 0|θ = 1) = 0.2

i.e if y = 0, θ = 1, e must be -1

Together with the prior given by eqn.(13), thefollowing inference can be made:

p(θ = −1|y = 0) = αp(y = 0|θ = −1)p(θ = −1)

= α× 0.8× 0.6 = 0.48α

p(θ = 0|y = 0) = αp(y = 0|θ = 0)p(θ = 0)

= α× 0× 0.2 = 0

p(θ = 1|y = 0) = αp(y = 0|θ = 1)p(θ = 1)

= α× 0.2× 0.2 = 0.04α

Since sum of probabilities must equal to 1, i.e.

1 = p(θ = −1|y = 0) + p(θ = 0|y = 0)++p(θ = 1|y = 0) = 0.48α + 0.04α = 0.52α

α = 1/0.52 = 1.92

the Bayesian inference yields pdf of θ as

p(θ|y = 0) =

0.92 for θ = −10 for θ = 00.08 for θ = 1

Consequently the mean estimate and MAP esti-mate are, respectively, given as

θ = −1× p(θ = −1|y = 0) + 0× p(θ = 0|y = 0)

+1× p(θ = 1|y = 0) = −0.84

MAP (θ) = −1

This completes our illustration example.

The solution of general estimation problem canbe complicated. For example, even a simple linearstate space model turns out to be a nonlinearestimation problem if some or all parameters are

unknown. As a special case of the general non-linear state space model, a linear time-invariantstate-space model is given:

xt+1 = Axt + vt (15)

yt = Cxt + et (16)

where vt and et follow normal distributions. Thecelebrated Kalman filter solves state estimationproblem if all parameters are known. However,if all of some of parameters in A and C are notknown, they have to be estimated and formulatedas additional states, such as θt+1 = θt for theunknown parameters. The estimation of linearsystems now becomes nonlinear problem. By aug-menting parameter states to the original states,the augmented system can be formulated underthe general nonlinear state estimation framework.

It is interesting to compare the subspace iden-tification [van Overschee and Moor, 1996] withthe Bayesian estimate for the linear system. Inthe first step of subspace identification, a set ofstates XN = {x0, . . . , xN} is directly estimatedfrom the output YN = {y0, . . . , yN} through sub-space projection. Under Bayesian framework, thisstep is the estimation of p(XN |YN ) although sub-space identification only gives a point estimationof p(XN |YN ). The second step of the subspaceidentification is to estimate the parameter θ fromthe state, XN . In the Bayesian framework, this isa point estimate p(θ|XN ). Thus, indeed subspaceidentification is a special case of the Bayesianinference. While in Bayesian inference, the stateand the unknown parameters are estimated si-multaneously, they are estimated in sequence insubspace identification. While Bayesian estimatecan provide a full pdf (thus confidence interval) ofthe estimate, subspace identification has difficultyto get the confidence interval.

4. KALMAN FILTER AND EXTENDEDKALMAN FILTER

The sequential Bayesian inference is nothing butthe propagation of pdfs. If the pdfs are simplyGaussian, the propagation of pdfs can be simpli-fied to the propagation of mean and covariancesince these two statistics completely determine theGaussian pdf.

KF is optimal (minimum mean square error) ifthe underlying system is linear and noises areGaussian. Even if the noises are not Gaussian, KFis still optimal up to the first two moments in theclass of linear estimators.

The KF consists of two steps: prediction followedby update, same as the procedure of generalsequential Bayesian inferencing. In the prediction

step, the filter propagates the estimate from aprevious time step to the current time step. Giventhe past estimate xt−1|t−1, the prediction step isstraightforwardly solved by

xt|t−1 = E[f(xt−1|t−1, vt−1)]

Pt|t−1 = Cov[xt|t−1]

where xt|t−1 denotes a prediction of xt given allobservations up to time t− 1.

The update step can be derived as the linearminimum mean-squared error estimator. Giventhat the mean is to be updated by the linear rule

xt|t = xt|t−1 + Ltεt

εt = yt − yt|t−1

yt|t−1 = E[h(xt|t−1, et)]

Pt|t = Cov[xt|t]

the weight matrix Lt is chosen to minimize thetrace of the updated covariance matrix Pt|t, andthe result is given by

Lt = Σxyt Σ−1

t

where

Σxyt = Cov(xt|t−1, εt)

Σt = Cov(εt)

Consequently, the updated covariance is solved by

Pt|t = Pt|t−1 − LtΣtLTt

Therefore, according to [Julier and Uhlmann,2004], if the following sets of expectations areavailable: the predicted state (predicted mean)and its covariance (xt|t−1, Pt|t−1), the updatedstate (updated mean) and its covariance (xt|t, Pt|t),and the cross covariance between the predictedstate and predicted output, the KF update equa-tions can be applied. When the system is lin-ear, the above sets of statistics yield the well-known Kalman filter. When the system is non-linear, methods for approximating these quanti-ties (probability distribution functions) must beused. The problem of applying the KF to a non-linear system becomes one of applying nonlin-ear transformations to mean and covariance es-timates [Julier and Uhlmann, 2004]. In the ex-tended Kalman filter, the nonlinear functions g(x)and h(x) are linearized through Taylor series ex-pansions in terms of the prediction error xt −xt|t, and the higher order terms are ignored. Thisprocedure results in the first-order EKF. It suf-fers from second and higher-order linearizationerrors, the need for Jacobian matrices, and im-plementation difficulties. If nonlinearities can notbe approximated well by linearization, EKF canbe biased and inconsistent. In this case, a higher

order approximation may also be used. A second-order EKF, however, demands extensive imple-mentation efforts due to the need for Hessianmatrices [Sitz et al., 2002].

5. UNSCENTED KALMAN FILTER

Instead of approximating the nonlinear functions,the unscented Kalman filter considers approxima-tion of the probability distribution function.

Intuitively, the sets of means and covariances oreven the pdfs may also be estimated through di-rect Monte Carlo simulations. For example, if xfollows a distribution p(x), and we are interestedin calculating the pdf of y = f(x). In the firststep, generate n samples of x according to p(x),and then calculate n samples of y according tothe mapping y = f(x). A distribution of y canbe estimated through the sampled y. With knowndistribution of y, the second step is to calculatethe mean and covariance of y. The cross covari-ance of y and x may also be calculated fromthe sampled data. With sufficiently large n, themean and variance can be accurately estimated.However, the computation load of this approachis intensive. This Monte-Carlo method for theBayesian sequential inference is nevertheless in-tuitive and is illustrated in Fig.5, where stars arethe samples selected to represent the pdfs.

)|( 11 −− tt Yxp

),( 11 −− tt vxf

)|( 1−tt Yxp

* **

**

** * *

**

**

*

Prediction

)|( 11 −− tt Yxp

),( 11 −− tt vxf

)|( 1−tt Yxp

* **

**

** * *

**

**

*

Prediction

* **

**

*

*

)|( 1−tt Yxp


*******

Update

* **

**

*

*

)|( 1−tt Yxp


*******

Update

Fig. 5. Sequential Bayesian inferencing throughMonte-Carlo simulation.

Instead of performing Monte-Carlo sampling ofx, controlled samples may greatly improve theefficiency. The unscented transformation as in-troduced in [Julier and Uhlmann, 2004; van derMerwe, 2004] is one of the efficient sampling ap-proach. The controlled samples, known as sigmapoints, are not drawn at random; they are de-terministically chosen so that they exhibit cer-tain specific properties. As a result, high-orderinformation about the distribution can be cap-tured with a fixed, small number of points. Whichsamples in x through p(x) can be considered as

important and reflecting the characteristics of thedistribution? Intuitively, the points at 1 standarddeviation, 2 standard deviation, etc, may be con-sidered as importance. To this end, let’s give thegeneral solution next.

Consider the nonlinear function y = f(x). Afterrigorous mathematical derivations by expandingthe pdf and matching its 1st and 2nd order mo-ments with the approximated ones, the unscentedsampling proceeds as follows [Julier and Uhlmann,2004; van der Merwe, 2004]:

A set of sigma points consists of p + 1 points,x(i), each point having its associated weight, w(i).The weights can be positive or negative but, toprovide an unbiased estimate, they must obey thecondition

p∑

i=0

w(i) = 1

With those points, the mean y and covariance Σy

are calculated as follows.

(I) Instantiate each point through the functionto yield the set of transformation sigmapoints

y(i) = f(x(i))

(II) The mean is given by the weighted averageof the transformed points

y =p∑

i=0

w(i)y(i)

(III) The covariance is the weighted outer productof the transformed points

Σy =p∑

i=0

w(i)(y(i) − y)(y(i) − y)T

(IV) The cross-covariance is the weighted productof the original points and the transformedpoints

Σxy =p∑

i=0

w(i)(x(i) − x)(y(i) − y)T

One set of sigma points that satisfies the aboveconditions consists of a symmetric set of 2n pointsthat lie on the

√nth covariance contour [Julier

and Uhlmann, 2004]:

x(i) = x + (√

nΣx)i

w(i) =12n

x(i+n) = x− (√

nΣx)i

w(i+n) =12n

where n is the dimension of x, (√

nΣx)i is theith row or column of the matrix square root ofnΣx. Two features of the unscented transforma-tion worth noting: 1) The computation load of

the unscented transformation is no more than theextended kalman filter but no Jacobian needs tobe calculated. 2) The unscented transformationcalculates the projected mean and covariance cor-rectly to the second order.

To illustrate above procedure, consider a simpleexample of sigma points and mean/variance com-putation. For a scalar variable x that has thenormal distribution, two sigma points may beselected as µx − σx and µx + σx shown in Fig. 6,where µx and σx are mean and standard deviationof the random variable. If x ∼ N(2, 1), then thetwo sigma points are x(1) = 1 and x(2) = 3,respectively. The corresponding weights of thesetwo sigma points are 1/2. Consider a mappingy = 5x + 1 where the theoretical mean and vari-ance of y can be easily derived as µy = 11 andσ2

y = 25. The covariance of y and x can also bederived as σxy = 5. Now using the sigma pointsfor numerical computations, the following resultscan be obtained:

µx = 1/2x(1) + 1/2x(2) =12(1 + 3) = 2

σ2x = 1/2(x(1) − µx)2 + 1/2(x(1) − µx)2

= 1/2(1− 2)2 + 1/2(3− 2)2 = 1

which are exactly the same as the theoreticalresults. With these two sigma points, the corre-sponding mappings, respectively, are

y(1) = 5x(1) + 1 = 5× 1 + 1 = 6

y(2) = 5x(2) + 1 = 5× 3 + 1 = 16

The following statistics can be calculated accord-ing to the unscented calculation algorithm:

µy = 1/2y(1) + 1/2y(2) =12(6 + 16) = 11

σ2y = 1/2(y(1) − µy)2 + 1/2(y(2) − µy)2

= 1/2(6− 11)2 + 1/2(16− 11)2 = 25

σxy = 1/2((x(1) − µx)(y(1) − µy)

+1/2((x(2) − µx)(y(2) − µy)

= 1/2(1− 2)(6− 11) + 1/2(3− 2)(16− 11) = 5

These numerical results are in complete agreementwith the theoretical ones. For nonlinear mapping,these statistics can be accurate up to second order[Julier and Uhlmann, 2004; van der Merwe, 2004].

Clearly, the choice of the sigma points is notunique. Julier and Uhlmann (2004) follows thestudy by proposing a number of different choiceof sigma points to adapt to different pdfs viaintroducing additional scaling factors. The sigmapoints may also be expanded and this will givefreedom to match high order moments such as

xxx σµ +=)2(xxx σµ −=)1(

Fig. 6. An illustrative sigma point example

skewness etc. van der Merwe (2004) went furtherto propose numerous unscented sampling pro-cedures and showed the similarity between thesigma point approach and classical weighted leastsquares, indicating certain optimality of the un-scented transformation.

To conclude this section, we summarize the pro-cedure for the unscented Kalman filter.

(I) Augment the state in eqn.(6) and (7) byincluding vt and et, namely

Xt =

xt

vt

et

The new state space equation is now writtenas

Xt+1 = f(Xt)

yt = h(Xt)

(II) A set of sigma points is chosen by apply-ing the sigma point selection algorithm tothe augmented state, resulting sigma points,X

(i)t−1, i = 1, . . . , p + 1 and corresponding

weights w(i).(III) The mapping of the sigma points in Xt

and yt is given by the nonlinear state spaceequations,

X(i)t = f(X(i)

t−1)

y(i)t = h(X(i)

t−1)

(IV) The mean of the predicted state and outputis calculated below:

Xt|t−1 =p∑

i=0

w(i)X(i)t

yt|t−1 =p∑

i=0

w(i)y(i)t

(V) Variance and cross-covariance are calculatedas

)|( 11 −− tt Yxp

),( 11 −− tt vxf

)|( 1−tt Yxp

* * **

Prediction

)|( 1−tt Yxp


Update

** *

*

Fig. 7. An illustrative UKF inference procedure

Pt|t−1 =p∑

i=0

w(i)(X(i)t −Xt|t−1)(X

(i)t −Xt|t−1)T

Σt =p∑

i=0

w(i)(y(i)t − yt|t−1)(y

(i)t − yt|t−1)T

Σxyt =

p∑

i=0

w(i)(X(i)t −Xt|t−1)(y

(i)t − yt|t−1)T

(VI) The update step is given by

Xt|t = Xt|t−1 + Ltεt

εt = yt − yt|t−1

Pt|t = Pt|t−1 − LtΣtLTt

whereLt = Σxy

t Σ−1t

The procedure of UKF can be illustrated, in anal-ogy to Monte-Carlo Bayesian inference, in Fig.7,for a scalar state inference. It is essentially thepropagation of “a few” sigma points to approxi-mate entire pdf propagation. Once again the keydifference between UKF and EKF is the approx-imation. EKF approximates the nonlinear map-ping, while UKF approximates the pdf.

6. ENGINEERING APPLICATIONEXAMPLES

Parameter estimation using the unscented Kalmanfilter for several engineering application examplesis demonstrated in this section.

Example 1. Consider a 2nd order exothermic re-action [Muske and Edgar, 1997]:

dCA

dt= −k0 exp(−Ea/RT )C2

A

dT

dt= −∆H

ρCk0 exp(−Ea/RT )C2

A +UA

V ρC(Tc − T )

where Tc is a manipulated variable. The parame-ters of interest are

0 200 400 600-35

-30

-25

-20

-15par1

0 200 400 600-2

0

2

4

6par2

0 200 400 600-2

0

2

4

6Concentration

0 200 400 6000

50

100

150

200Temperature

Fig. 8. 2nd order exothermic reaction.

par1 = −∆H

ρCk0 = −30

par2 =UA

V ρC= 0.001

and temperature T is measured. A set of 600 datapoints is simulated with sampling time 0.05. Tc isa random sequence generated from Matlab func-tion idinput with magnitude [10 30] and frequency[0 0.1]. The initial concentration is 5 mole/l andinitial temperature 20oC. For the UKF, the initialguess of state and parameters are C0 = 2 mole/l,T0 = 30oC, par1 = −20, and par2 = 0.5. The es-timation results are shown in Fig.8, where conver-gence of all states and parameters is observed. TheUKF has also been applied to the estimation of theexponents of the above model, which is extremelynonlinear and sensitive, but the results are notconvergent even with multi-level perturbations.Further studies are needed to determine if thisis the limitation of UKF or due to identifiabilityproblem/initial condition problem.

Example 2. Consider a highly nonlinear systemdiscussed in [Gordon et al., 1993]:

xt = 0.5xt−1 +25xt−1

1 + x2t−1

+ ut−1 + vt−1

yt =x2

t

20+ et

ut = 8 cos(1.2(t− 1))vt ∼ N(0, 10)et ∼ N(1, 1)

where parameters of interest are the three coeffi-cients in the above equations:

par1 = 0.5par2 = 25par3 = 20

A set of 2000 data points is simulated with ut =8 cos(1.2(t−1)) and initial state x0 = 0.1. For theUKF, the initial guess of the state and parametersis x0 = 0.15, par1 = 1, par2 = 20, par3 = 25. Theestimation results of state and three parameters

0 500 1000 1500 2000-1

-0.5

0

0.5

1par1

0 500 1000 1500 200015

20

25

30par2

0 500 1000 1500 200015

20

25

30par3

0 500 1000 1500 2000-40

-20

0

20

40state

Fig. 9. Parameter and state estimation of highlynonlinear system.

0 50 100 150 200 250 300 350 400 450 50040

60

80

100Virus reproduction rate due to infected cell

0 50 100 150 200 250 300 350 400 450 500-5

0

5

10

15Virus self death rate

u

k

0 50 100 150 200 250 300 350 400 450 50040

60

80

100Virus reproduction rate due to infected cell

0 50 100 150 200 250 300 350 400 450 500-5

0

5

10

15Virus self death rate

u

k0 50 100 150 200 250 300 350 400 450 500

-2

0

2x 10

6 Uninfected cells

0 50 100 150 200 250 300 350 400 450 5000

2

4x 10

6 Infected cells

0 50 100 150 200 250 300 350 400 450 500-1

0

1x 10

8 Virus

v

x

y

0 50 100 150 200 250 300 350 400 450 500-2

0

2x 10

6 Uninfected cells

0 50 100 150 200 250 300 350 400 450 5000

2

4x 10

6 Infected cells

0 50 100 150 200 250 300 350 400 450 500-1

0

1x 10

8 Virus

v

x

y

Fig. 10. Parameter and state estimation of virusdynamic model.

are shown in Fig.9, where convergence of stateand first two parameters is observed while thirdparameter has a bias error.

Example 3. The model of virus dynamics is pro-posed in [Nowak, 2001] given by

x(t) = λ− dx(t)− βx(t)v(t) + e1(t)y(t) = βx(t)v(t)− ay(t) + e2(t)v(t) = ky(t)− u(t)v(t) + e3(t)

where e1(t), e2(t), e3(t) are disturbances with nor-mal distribution N(0, 1012). x(t) represents thenumber of the healthy cells, y(t) the number ofinfected cells, and v the number of free virus. Bothy(t) and v(t) are measured. The parameters ofinterest are

k = 100u = 5

A set of 500 data points is simulated with initialstates x(0) = 105, y(0) = 102, v(0) = 103. Thesampling time is 0.1. For UKF estimation, theinitial guess of states and parameters is x(0) =104, y(0) = 0, v(0) = 0, k = 50, u = 10. Theresults of estimation are shown in Fig.10 whereboth states and parameters converge to the truevalues.

Example 4. The following model is a first-principlemodel derived for cytotoxicity process [Huang and

0 100 200 300 400 500 600 700 800 900 10000

50

100Extracellular concentration (perturbations)

0 100 200 300 400 500 600 700 800 900 10000

5

10intracellular concentration

0 100 200 300 400 500 600 700 800 900 10000

1

2Cell population

0 100 200 300 400 500 600 700 800 900 1000-5

0

5Proliferation rate

0 100 200 300 400 500 600 700 800 900 1000-5

0

5Apoptosis coefficient

0 100 200 300 400 500 600 700 800 900 1000-2

0

2Necrosis coefficient

0 100 200 300 400 500 600 700 800 900 1000-5

0

5Proliferation rate

0 100 200 300 400 500 600 700 800 900 1000-5

0

5Apoptosis coefficient

0 100 200 300 400 500 600 700 800 900 1000-2

0

2Necrosis coefficient

Fig. 11. Parameter and state estimation of cyto-toxicity dynamic model.

Xing, 2006]:

dci

dt= k3(k1ce +

k2ce

Ki + ce− ci) + v1

dN

dt= (a0 + a1ci + a2ce)N + v2

where v1 and v2 are disturbance with normaldistribution N(0, 0.01). ci is the intracellular toxi-cant concentration, and N is cell population. Themeasured variable is the cell population with mea-surement noise e ∼ N(0, 0.01). The parametersof interest are cell proliferation rate a0, apoptosiscoefficient a1, and the necrosis coefficient a2 withthe following numerical values:

a0 = 0.031a1 = 1a2 = −0.381

A set of 1000 data points is simulated withrandom multi-level external perturbation on theextracellular concentration ce. The initial statesci(0) = 0 and N(0) = 1. The discretization time∆t = 1/3600 hr = 1 s. The initial guess ofstates and parameters for the UKF is ci(0) =0.5, N(0) = 0.3, a0 = 0, a1 = 0, a2 = 0. Theestimations results are shown in Fig 11, where allstates and parameters converge to the true values.

7. CONCLUSION

In this paper, nonlinear system identificationproblem is reviewed under the sequential Bayesianinference framework. The parameter estimationproblem is converted to state estimation problem.The general solution to sequential Bayesian in-ference is discussed. The Kalman filter, extendedKalman filter, and unscented Kalman filter (UKF)are reviewed. The UKF is further elaborated andexplained. An emphasis has been on interpreta-tion of Bayesian filters/UKF from engineering ap-plication perspective. Graphic presentations havebeen used to illustrate several key mathematicalconcepts in sequential Bayesian inference through-out the paper. Several engineering application ex-amples are used to illustrate the application ofUKF for parameter estimation. The applicationstudies indicate that although UKF is powerfuland simple, additional research is needed to study

its effectiveness in estimating extremely nonlinearparameters.

ACKNOWLEDGEMENTS

This work is supported in part by Natural Sciencesand Engineering Research Council of Canada. Thesupport of research visit by National Universityof Singapore and IEEE Control System ChapterSingapore is greatly acknowledged.

REFERENCES

N. Bergman. Recursive Bayesian Estimation:Navigation and Tracking Applications. PhDthesis, Linkoping University, 1999.

S. Boyd and L.O. Chua. Fading memory andproblem of approximating nonlinear operatorswith volterra series. IEEE Trans. Circuits andSystems, 32:1150–1161, 1985.

W. Chen, B.R. Bakshi, P.K. Goel, and S. Ungar-ala. Bayesian estimation via sequential montecarlo sampling: Unconstrained nonlinear dy-namic systems. Ind. Eng. Chem. Res., 2004.

A. Doucet, N. de Freitas, and N. Gordon. InSequential Monte Carlo Methods in Practice.Springer Verlag, 2001.

N.J. Gordon, D.J. Salmond, and A.F.M. Smith.A novel approach to nonlinear/non-gaussianbayesian state estimation. IEE Proceedingson Radar and Signal Processing, 140:107–113,1993.

W. Greblicki. Nonparametric identification ofwiener systems. IEEE Trans. Information The-ory, 38:1487–1493, 1992.

W. Greblicki and M. Pawlak. Nonparametricidentification of hammersein systems. IEEETrans. Information Theory, 35:409–418, 1989.

F. Gustafsson and P. Hriljac. Particle filtersfor system identification of state-space mod-els linear in either parameters or states. InProc. 13th IFAC Symposium on system identi-fication, pages 1287–1292, Rotterdam, Nether-lands, 2003.

B. Huang and S.L. Shah. Closed-loop identifica-tion: a two-step approach. Journal of ProcessControl, 17(6), 1997.

B. Huang and J. Xing. Dynamic modeling andprediction of cytotoxicity on microelectroniccell sensor array. submitted to Canadian Jour-nal of Chemical Engineering, 2006.

S. Julier and J.K. Uhlmann. Unscented filteringand nonlinear estimation. Proceedings of theIEEE, 92:401–422, 2004.

J.C. Kantor. An overview of nonlinear geometricalmethods for process control. In D.M. Prettand M. Morari, editors, Shell Process ControlWorkshop. Butterworths, New York, 1987.

P. Lindskog. Methods, algorithms and tools forsystem identification based on prior knowledge.PhD thesis, Linkoping University, 1996.

L. Ljung. System Identification. Prentice-Hall,2nd edition, 1999.

K.R. Muske and T.F. Edgar. Nonlinear stateestimation. In M.A. Henson and D.E. Seborg,editors, Nonlinear Process Control, chapter 6,pages 311–370. Prentice Hall, New Jersey, 1997.

A.T. Nelson. Nonlinear Estimation and Modelingof Noisy Time-Series by Dual Kalman FilteringMethods. PhD thesis, Linkoping University,2000.

M. Norgaard, M. Poulsen, and O. Ravn. Newdevelopments in state estimation for nonlinearsystems. Automatica, 36:1627–1638, 2000.

M.A. Nowak. Virus dynamics: Mathematical prin-ciples of immunology and virology. Oxford Uni-versity Press, 2001.

R.K. Pearson and B.A. Ogunnaike. Nonlinearprocess identification. In M.A. Henson andD.E. Seborg, editors, Nonlinear Process Con-trol, chapter 2, pages 11–110. Prentice Hall,New Jersey, 1997.

A. Sitz, U. Schwarz, J. Kurths, and H.U. Voss.Estimation of parameters and unobserved com-ponents for nonlinear systems from noisy timeseries. Physical Review, 66:016210–1 – 016210–9, 2002.

T. Soderstrom and P. Stoica. System Identifica-tion. Prentice Hall International, UK, 1989.

R. van der Merwe. Sigma-Point Kalman filtersfor probability inference in dynamic state-spacemodels. PhD thesis, Oregon Health and ScienceUniversity, 2004.

P. van Overschee and B.D. Moor. Subspace Iden-tification for Linear Systems. Kluwer AcademicPublishers, Boston, 1996.

E. Wan and R. van der Merwe. The unscentedkalman filter. In S. Haykin, editor, KalmanFiltering and Neural Networks, pages 221–280.Wiley, 2001.

E. Wan, R. van der Merwe, and A. Nelson. Dualestimation and the unscented transformation.In S.A. Solla, T.K. Leen, and K.R. Mller, ed-itors, Neural Information Processing Systems12. MIT Press, 2000.

E.A. Wan and R. van der Merwe. Theunscented kalman filter for nonlinearestimation. In Proceedings of Symposium2000 on Adaptive Systems for SignalProcessing, Communication and Control, LakeLouise, Alberta, Canada, Oct. 2000. IEEE.http://www.ene.unb.br/ gaborges/disciplinas/pe/papers/wan2000.pdf.

overview of emerging bayesian biao huang …bhuang/research/nonlinearid...volterra models [boyd and...

Documents