nonlinear system identification using two-dimensional wavelet-based state-dependent parameter models

This article was downloaded by: [Umeå University Library]On: 22 September 2013, At: 23:33Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Systems SciencePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tsys20

Nonlinear system identification using two-dimensionalwavelet-based state-dependent parameter modelsNguyen-Vu Truong a & Liuping Wang aa School of Electrical and Computer Engineering, RMIT University, Melbourne, VIC 3001,AustraliaPublished online: 28 Oct 2009.

To cite this article: Nguyen-Vu Truong & Liuping Wang (2009) Nonlinear system identification using two-dimensionalwavelet-based state-dependent parameter models, International Journal of Systems Science, 40:11, 1161-1180, DOI:10.1080/00207720902985419

To link to this article: http://dx.doi.org/10.1080/00207720902985419

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/tsys20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00207720902985419

http://dx.doi.org/10.1080/00207720902985419

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

International Journal of Systems ScienceVol. 40, No. 11, November 2009, 1161–1180

Nonlinear system identification using two-dimensional wavelet-based state-dependent

parameter models

Nguyen-Vu Truong and Liuping Wang*

School of Electrical and Computer Engineering, RMIT University, Melbourne, VIC 3001, Australia

(Received 9 August 2008; final version received 8 April 2009)

This article presents a nonlinear system identification approach that uses a two-dimensional (2-D) wavelet-basedstate-dependent parameter (SDP) model. In this method, differing from our previous approach, the SDP is afunction with respect to two different state variables, which is realised by the use of a 2-D wavelet seriesexpansion. Here, an optimised model structure selection is accomplished using a PRESS-based procedure inconjunction with orthogonal decomposition (OD) to avoid any ill-conditioning problems associated with theparameter estimation. Two simulation examples are provided to demonstrate the merits of the proposedapproach.

Keywords: nonlinear systems; identification for control; nonlinear system identification; PRESS statistics;multivariable nonlinearities

1. Introduction

The state-dependent parameter (SDP) model structurehas been a well-known and natural way to expressnonlinear systems (Young 1993, 1998, 2001; Young,McKenna and Bruun 2001; Truong, Wang and Young2006, 2007b; Truong, Wang and Huang 2007a;Truong and Wang 2008, 2009). This model structureis written in the form of a linear regression in specified,state variables (i.e. derivatives or lagged values of theinput and output variables), multiplied by associatedSDPs, which are functions of the respective statevariables, to characterise the nonlinearities.

Previous works on SDP estimation and modelling(Young 1993, 1998, 2001; Young et al. 2001; Truonget al. 2006, 2007a, 2007b; Truong and Wang 2008,2009) only consider a specific SDP model structurethat relies very much on a single state dependency. Inthe presence of significant interactions between thesystem’s various input/output terms, a model of thistype has limited applications since it cannot representthe multivariable dependence nature of the system’snonlinear dynamics. Hence, it is valuable to extend theoriginal SDP model of single state dependency tomultivariable state dependency to capture such inter-acted multivariable nonlinearities.

The focus of this work is on the construction of aneffective nonlinear system identification technique viathe so-called two-dimensional (2-D) SDP (2-DSDP)model, including a systematic approach to the selectionof a set of candidate model structures and the final

determination of the optimal model itself. This parti-

cular model structure refers to a type of SDP models in

which the SDP is a function of two different state

variables. It, in turn, makes the SDP relationship be a

surface instead of being a curve as in the single state

dependency (1-D) case. At this point, the system

identification task is to solve the approximation

problem of these 2-D functions within the structure

of a dynamic 2-DSDP model.Traditionally, to address this problem, there exists

a number of approaches available in the open

literature, employing various types of functions, such

as polynomial, spline, kernel and other basis functions

(Chen, Billings and Luo 1989; Savakis, Stoughton and

Kanetkar 1989; Baudat and Anouar 2001; Gonzalez

et al. 2003). In recent years, wavelet has been widely

used due to its excellent localization properties in both

time and frequency (Chui 1992; Meyer 1992; Mertin

1999). With these properties in association with wave-

let multiresolution decomposition, an arbitrary func-

tion can be well approximated at any level of regularity

and a desired accuracy by a small number of wavelet

basis functions. This makes wavelet series expansion

outperform many other approximation schemes (Chui

1992), especially in approximating complex functions

or functions with sharp discontinuities. Thus, it has

become an effective new tool for functional

approximation.The use of 2-D wavelets for nonlinear system

identification has been studied (Liu, Billings and

*Corresponding author. Email: [email protected]

ISSN 0020–7721 print/ISSN 1464–5319 online

� 2009 Taylor & Francis

DOI: 10.1080/00207720902985419

http://www.informaworld.com

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

Kadirkamanathan 1998; Billings and Wei 2005);however, their application in the context of 2-DSDPmodel is new. In the recent papers by Truong et al.(2006, 2007a, 2007b) and Truong and Wang (2008,2009), a 1-D wavelet-based SDP nonlinear systemmodel was proposed, in which 1-D wavelets are usedfor the parameterisation of the associated SDPs. Thisarticle extends the 1-DSDP model structure to acompact mathematical formulation in a 2-D context,in which the so-called 2-D wavelet series expansion isused for the approximation of the respective 2-DSDPrelationships to form a class of nonlinear systemmodels called 2-D wavelet-based SDP (2-DWSDP)models. This is conceptually straight-forward: inessence the proposed approach converts a complicated2-DSDP estimation problem into a much simpler andcomputationally efficient implementation using wave-lets. Models obtained in this manner are very compact,thus can be used in a much wider range of applications.

Unlike the estimation of a linear time-invariantmodel which has a limited range of candidate modelstructures, model structure determination in nonlinearsystem identification is a challenging task by its ownright. First, the set of candidate model structures isrequired to be determined before the estimation. Herea novel approach is proposed based on the character-istics of wavelet functions. The selection of the scalingfactors (finest and coarsest) for a wavelet seriesexpansion is crucial. It determines the amount ofinformation (i.e. regressor matrix) to be included forthe functional approximation, which in essence isrelated to the set of the candidate models and toboth the approximation performance and the compu-tational efficiency of the model structure selectionalgorithm. In the 1-D case (Truong et al. 2006, 2007a,2007b; Truong and Wang 2008), this information wasobtained from the non-parametrically estimated SDPrelationships, whereas, in the 2-DWSDP model situa-tion, this information is not available. In this article,new results on the selection of those scaling parametersin the context of the 2-D wavelet series expansion and2-DWSDP model setting are developed to enhance theprocedure of selecting candidate model structures.Second, based on this selected set of candidate modelstructures, the optimal structure of a 2-DWSDPmodel, along with its parameters, is chosen using thePRESS (Prediction Error Sums of Square) criterion asdescribed in our previous papers (Truong et al. 2006,2007a, 2007b; Truong and Wang 2008, 2009).Furthermore, since orthogonal decomposition (OD) isused in the PRESS computation, it enables thealgorithm to eliminate any numerically ill-conditionedterms within a given candidate model structure (Hong,Harris, Chen and Sharkey 2003a; Hong, Sharkeyand Warwick 2003b, 2003c; Billings and Wei 2008).

This further enhances the performance and efficiencyof this model structure selection algorithm.

The structure of this article is outlined as follows.Section 2 introduces and discusses the 2-D waveletfunctional approximation as well as the associated2-DWSDP model. The selection of candidate struc-tures is discussed in Section 3. Section 4 describes thenonlinear model structure selection procedure usingthe PRESS criterion and the forward regression, andsummarises the identification procedure using theproposed approach. Section 5 presents two simulationexamples to illustrate the efficiency of the proposedtechnique. Finally, Section 6 concludes this article.

2. 2-DWSDP model

It is assumed that a nonlinear system can berepresented by the following 2-DSDP model:

yðkÞ ¼Xnyq¼1

fqðxmq,nq Þyðk�qÞþXnuq¼0

gqðxlq,pqÞuðk�qÞþ eðkÞ

ð1Þ

where u(k) and y(k) are, respectively, the sampledinput-output sequences; while {nu, ny} refer to themaximum number of lagged inputs and outputs.The functions fq, gq are dependent on xmq,nq

¼ (xmq,

xnq|mq6¼ nq2 x) and xlq,pq¼ (xlq, xpq|lq 6¼pq2 x) in which

x¼ {y(k� 1), . . . , y(k� ny), u(k), . . . , u(k� nu)}. As aresult, they are regarded as 2-DSDP. Finally, e(k)refers to the noise variable, assumed initially to be azero-mean, white noise process that is uncorrelatedwith the input u(k) and its past values.

For example, a first-order 2-DSDP model repre-sentation of a nonlinear system can take the followingform:

yðkÞ ¼ f1½yðk� 1Þ, uðkÞ�yðk� 1Þ þ g0½uðkÞ, uðk� 1Þ�uðkÞ

ð2Þ

Let x¼ {x1, x2,x3}¼ {y(k� 1), u(k), u(k� 1)}. Inthis case, the 2-DSDPs f1 and g0 are dependent onx1,2¼ {x1, x2}¼ {y(k� 1), u(k)} and x2,3¼ {x2,x3}¼{u(k), u(k� 1)}, respectively.

2.1. 2-D wavelet series expansion

In the case of 2-D wavelet series expansion, the waveletbasis function is no longer single dimensional butvaried with respect to two different variables, i.e. x1and x2. To formulate a 2-D wavelet basis function�[2](x1, x2), a natural approach is based on thetensor product of 2 1-D wavelet functions �(x1)and �(x2) (Liu et al. 1998; Billings and Wei 2005)

1162 N.-V. Truong and L. Wang

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

as follows:

�½2�ðx1, x2Þ ¼ �ðx1Þ�ðx2Þ ð3Þ

For example, let �(x) be a 1-D Mexican hat wavelet as

described in the following equation:

�ðxÞ ¼ð1� x2Þe�0:5x

2

if x 2 ð�4, 4Þ

0 otherwise

( )ð4Þ

Then its 2-D version (shown in Figure 1) takes the

following form:

�½2�ðx1,x2Þ

¼ð1�x21Þð1�x22Þe

�0:5ðx21þx2

2Þ if x1,x2 2 ð�4,4Þ

0 otherwise

( )

ð5Þ

Let f[2] be the associated 2-DSDP relationship in

approximation with respect to two different state

variables (x1, x2), where it is represented by a 2-D

wavelet series expansion as in the following form:

f ½2�ðx1, x2Þ ¼Ximax

imin

Xj12Lix1

Xj22Lix2

ai, j1, j2�½2�i, j1, j2ðx1, x2Þ ð6Þ

and

�½2�i, j1, j2ðx1, x2Þ ¼ �½2�ð2�ix1 � j1, 2�ix2 � j2Þ ð7Þ

Here, {ai,j1,j2} is the set of coefficients of the expansion;

imin and imax correspond to the minimum and maxi-

mum scales used for the approximation of f [2](x1, x2).

Lix1, Lix2 (determined as in (8) and (9)) are the

translation libraries with respect to �(x1), �(x2) at

scale i, respectively. They are derived by using the

compact supported conditions of the mother wavelet(see Section 3.1 for details)

Lix1 ¼ f j 2 ð2�ix1min � s2, 2

�ix1max � s1Þ, j 2 Zg ð8Þ

Lix2 ¼ f j 2 ð2�ix2min � s2, 2

�ix2max � s1Þ, j 2 Zg ð9Þ

where (s1, s2) is the supporting range of the motherwavelet.1 For example, for the Mexican hat waveletas in (5), s1¼�4 and s2¼ 4.

Since imin and imax determine the set of terms usedfor the approximation, the next question to addresshere is how to select these parameters in a 2-D wavelet-based context. This will be discussed and illustrated inSection 3.

2.2. 2-DWSDP model formulation

Based on this formulation, the model structure of (1) isparameterised using a 2-D wavelet series expansion asin (6), where the 2-DSDPs fq(xmq,nq

) and gq(xlq,pq) canbe approximated as

fqðxmq, nqÞ ¼Ximax

imin

Xj12Lixmq

Xj22Lixnq

afq, i, j1, j2�½2�i, j1, j2ðxmq, nq Þ

ð10Þ

gqðxlq, pqÞ ¼Ximax

imin

Xj12Lixlq

Xj22Lixpq

bgq, i, j1, j2�½2�i, j1, j2ðxlq, pqÞ ð11Þ

in which, {Lixmq,Lixnq} and {Lixlq,Lixpq} correspond tothe translation libraries with respect to {�(xmq),�(xnq)} and {�(xlq),�(xpq)} at scale i; afq,i,j1,j2 andbgq,i,j1,j2 are the coefficients.

Figure 1. 2-D Mexican hat wavelet function.

International Journal of Systems Science 1163

Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

Substituting (10) and (11) into (1), we obtain a2-DWSDP as follows:

yðkÞ¼Xnyq¼1

Ximax

imin

Xj12Lixmq

Xj22Lixnq

afq,i,j1,j2�½2�i,j1,j2ðxmq,nqÞ

24

35yðk�qÞ

þXnuq¼0

Ximax

imin

Xj12Lixlq

Xj22Lixpq

bgq,i,j1,j2�½2�i,j1,j2ðxlq,pq Þ

24

35

�uðk�qÞþeðkÞ ð12Þ

In this 2-DWSDP model, the parameters are thecoefficients of the respective 2-D wavelets, i.e. afq,i,j1,j2and bgq,i,j1,j2. With given information of the waveletbasis functions, i.e. �[2](x1, x2) and {ny, nu} as well asthe scaling parameters {imin, imax}, the next task here isto formulate (12) as a linear-in-the-parameter regres-sion equation, starting from the inner-most summation(j2) to the outer-most summation (q).

Let the inner-most coefficients and wavelet basisfunctions be represented in vector forms as follows:

�fq, j1 ¼ afq, i, j1, j2min, . . . , afq, i, j1, j2max

� �Tj22Lixnq

�fq, j1ðkÞ ¼h�½2�i, j1, j2min½xmq, nq ðkÞ�, . . . ,

�½2�i, j1, j2max½xmq, nq ðkÞ�ij22Lixnq

8>>>><>>>>:

9>>>>=>>>>;ð13Þ

and

�gq, j1¼ bgq, i, j1, j2min, . . . ,bgq, i, j1, j2max

� �Tj22Lixpq

�gq, j1ðkÞ ¼h�½2�i, j1, j2min½xlq,pqðkÞ�, . . . ,

�½2�i, j1, j2max½xlq,pqðkÞ�ij22Lixpq

8>>>><>>>>:

9>>>>=>>>>;ð14Þ

Note that as defined in (13) and (14), �fq,j1 and�gq,j1 are the parameter vectors which are functionsof j1, while �fq,j1(k) and �gq,j1(k) are functions of{j1, xmq,nq

(k)} and {j1, xlq,pq(k)}, respectively.Then,Xj22Lixnq

afq, i, j1, j2�½2�i, j1, j2ðxmq, nqÞ ¼ �fq, j1ðkÞ�fq, j1 ð15ÞX

j22Lixpq

bgq, i, j1, j2�½2�i, j1, j2ðxlq, pqÞ ¼ �gq, j1ðkÞ�gq, j1 ð16Þ

As a result, (12) can be simplified into:

yðkÞ ¼Xnyq¼1

Ximax

imin

Xj12Lixmq

�fq, j1ðkÞ�fq, j1

24

35yðk� qÞ

þXnuq¼0

Ximax

imin

Xj12Lixlq

�gq, j1ðkÞ�gq, j1

24

35uðk� qÞ þ eðkÞ

ð17Þ

Similarly, let

Afq,Li¼ �Tfq, j1min, . . . ,�

Tfq,j1max

h iTj12Lixmq

Zfq,LiðkÞ¼ �fq, j1minðkÞ, . . . ,�fq, j1maxðkÞ

� �j12Lixmq

yðk�qÞ

8><>:

9>=>;

ð18Þ

and

Bgq,Li¼ �Tgq, j1min, . . . ,�

Tgq,j1max

h iTj12Lixlq

Zgq,LiðkÞ¼ �gq, j1minðkÞ, . . . ,�gq, j1maxðkÞ

� �j12Lixlq

uðk�qÞ

8><>:

9>=>;

ð19Þ

in which Li refers to the whole translation libraryat scale i. As defined in (18) and (19), Zfq,Li

(k) andZgq,Li

(k) are functions of {xmq,nq(k), y(k� q)} and

{xlq,pq(k), u(k� q)}, respectively.Substituting (18) and (19) into (17), y(k) is

expressed as

yðkÞ ¼Xnyq¼1

Ximax

imin

Zfq,LiðkÞAfq,Li

" #

þXnuq¼0

Ximax

imin

Zgq,LiðkÞBgq,Li

" #þ eðkÞ ð20Þ

Now let

Aq ¼ ATfq,Limin

, . . . ,ATfq,Limax

h iTZfqðkÞ ¼ Zfq,Limin

ðkÞ, . . . ,Zfq,LimaxðkÞ

� �Bq ¼ BT

gq,Limin, . . . ,BT

gq,Limax

h iTZgqðkÞ ¼ Zgq,Limin

ðkÞ, . . . ,Zgq,LimaxðkÞ

� �

8>>>>>>><>>>>>>>:

9>>>>>>>=>>>>>>>;

ð21Þ

leading to the linear-in-the-parameter regressionequation:

yðkÞ ¼Xnyq¼1

ZfqðkÞAq

� �þXnuq¼0

ZgqðkÞBq

� �þ eðkÞ ð22Þ

In Equation (22), the parameter matrices Aq and Bq

are to be estimated from the experimental data, andZfq(k), Zgq(k) which are the wavelet terms areconstructed from experimental input–output data.

To integrate (22) with measured input and outputdata, we assume that y(0), y(1), . . . , y(N� 1) and u(0),u(1), . . . , u(N� 1) are available.

With

Y ¼ yð0Þ, . . . , yðN� 1Þ½ �T ð23Þ

U ¼ uð0Þ, . . . , uðN� 1Þ½ �T ð24Þ

� ¼ ½eð0Þ, . . . , eðN� 1Þ�T ð25Þ


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

Write (22) into the matrix form as

Y ¼Xnyq¼1

ZfqAq þXnuq¼0

ZgqBq þ� ð26Þ

where

Zfq ¼ ZTfqð0Þ, . . . ,ZT

fqðN� 1Þh iT

Zgq ¼ ZTgqð0Þ, . . . ,ZT

gqðN� 1Þh iT

8><>:

9>=>; ð27Þ

Let us define

A ¼ AT1 , . . . ,AT

ny

h iTZf ¼ ½Zf1, . . . ,Zfny �

B ¼ BT0 , . . . ,BT

nu

h iTZg ¼ ½Zg0, . . . ,Zgnu �

8>>>>>><>>>>>>:

9>>>>>>=>>>>>>;

ð28Þ

Substituting (28) into (26), we obtain

Y ¼ ZfAþ ZgBþ� ð29Þ

As a result, (12) is written in the matrix form as

Y ¼ P� þ� ð30Þ

where P is the data matrix and � is the parameter

vector to be estimated, and

P ¼ Zf,Zg

� �� ¼ ½AT,BT�

T

( )ð31Þ

3. Selection of candidate structures

One of the keys in nonlinear system identification is to

effectively select candidate structures. This is among

the most challenging tasks due to infinite possible

combinations of nonlinear regression terms. Therefore,

it is critical, at the first step, to reduce the set of

candidate structures to a manageable size based on

some known characteristics about the system under

study. This reduces the computational load and

improves the efficiency of the optimised model struc-

ture selection algorithm.In the situation of 2-DSDP models as considered in

this article, the finest and coarsest scaling parameters

imin, imax determine the set of terms as well as their

associated characteristics2 used for the approximation

of the respective 2-DSDP relationship, (i.e. f1(x1, x2)

via a 2-D wavelet series expansion as described in

Section 2.1). As a result, they play an important role in

the selection of candidate model structures for the

nonlinear system identification. If imin and imax are

properly selected and a compactly supported mother

wavelet is chosen, the set of candidate structures is now

limited and deterministic.The aim of this section is to derive criteria to guide

the selection of these parameters based on the available

information obtained from the input–output data as

well as the wavelet basis functions.Based on the formulation of 2-D wavelets (3) as

well as 2-D wavelet series expansion (6), a 2-DSDP

f1(x1, x2) can be represented in the following tensor

product:

f1ðx1, x2Þ ¼ h1ðx1Þh2ðx2Þ

Furthermore, by approximating h1(x1) and h2(x2)

using the following equations via 1-D wavelet series

expansion:

h1ðx1Þ ¼Xix1 max

ix1 min

Xj2Li

ch1, i, j�i, jðx1Þ ð32Þ

h2ðx2Þ ¼Xix2 max

ix2 min

Xj2Li

ch2, i, j�i, jðx2Þ ð33Þ

the problem is now separated into two sub-problems in

which we independently examine the problem of

the selection of scaling parameters, [ix1min, ix1max] and

[ix2min, ix2max], for the wavelet-based series expansion

of two unknown 1-D functions, h1(x1), h2(x2). In this

manner, the scaling parameters imin and imax used for

the 2-D wavelet series expansion of f1(x1, x2) can be

selected as:

imin ¼Minðix1 min, ix2 minÞ ð34Þ

imax ¼Maxðix1 max, ix2 maxÞ ð35Þ

Therefore, the question to be addressed here is how

to select the associated finest and coarsest scaling

parameters, ixmin and ixmax, for the wavelet series

expansion of an unknown 1-D function f (x) based on

the known characteristics of the state variable x and

the wavelet basis function �(x).

3.1. On the selection of scaling parameters

A 1-D function f (x) is represented by the following

1-D wavelet series expansion

f ðxÞ ¼Xi2Z

Xj2Z

di, j�i, jðxÞ ð36Þ

in which

�i, jðxÞ ¼ �ð2�ix� jÞ ð37Þ

By limiting the scaling factor i to be bounded with a

range of (ixmin, ixmax), Equation (36) is approximated


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

by the following equation:

f ðxÞ ¼Xixmax

ixmin

Xj2Z

ci, j�i, jðxÞ ð38Þ

Furthermore, if �(x) is compactly supported within

(s1, s2), the translation parameter j is bounded by the

inequality:

s1 5 2�ix� j5 s2 ð39Þ

This inequality is regarded as the compactly

supported condition of the mother wavelet �(x)

given by

�i, jðxÞ 6¼ 0 if s1 5 2�ix� j5 s2

�i, jðxÞ ¼ 0 otherwise

( )ð40Þ

From (39), we have

2�ixmin � s2 5 2�ix� s2 5 j5 2�ix� s1 5 2�ixmax � s1

ð41Þ

As a result,

2�ixmin � s2 5 j5 2�ixmax � s1 ð42Þ

Define

Li ¼ fj 2 ð2�ixmin � s2, 2

�ixmax � s1Þ, j 2 Zg ð43Þ

which is regarded as the translation library at scale i.As a result, we obtain

�i, jðxÞ 6¼ 0 if j 2 Li

�i, jðxÞ ¼ 0 if j =2Li

� �ð44Þ

Using (44), Equation (38) is equivalent to the

following equation:

f ðxÞ ¼Xixmax

ixmin

Xj2Li

ci, j�i, jðxÞ þXj =2Li

ci, j�i, jðxÞ

" #

¼Xixmax

ixmin

Xj2Li

ci, j�i, jðxÞ ð45Þ

since

�i, jðxÞ ¼ 0 when j =2Li ð46Þ

Now the next question to be addressed is how to

select the scaling parameters [ixmin, ixmax].Let us define the wavelet function library LW used

for the functional approximation of f(x) as

LW ¼ f�i, jðxÞ, ðixmin � i � ixmax, i 2 ZÞ and ð j 2 LiÞg

ð47Þ

Lixmax� Lixmax�1 � � � � � Lixmin

ð48Þ

As shown earlier (see (43) and (47)), the wavelet

function library LW is determined based on the values

of {xmin, xmax, s1, s2, ix min, ix max}, in which the interval

between (s1, s2) is the supporting range of the mother

wavelet �(x) (Figure 2c), and the interval between

(xmin, xmax) is the range of the state variable x

(Figure 2a).The selection of [ixmin, ixmax] determines the

amount (number of terms) and characteristics of the

information included in the wavelet function library

LW for the functional approximation. This is crucial

as it is directly related to both the approximation

performance and the computational efficiency of the

model structure selection algorithm. In the following,

we discuss and derive the criteria to guide the selec-

tion of ixmin and ixmax based on the information of

{xmin, xmax} and {s1, s2} which are given information

and can be directly determined very easily from the

data as well as the mother wavelet �(x).Let Int(x) denote the integer part of a scalar

variable x. We assume that �(x) is chosen so that

s1 5 0 and s2 4 0

s1 � Intðs1Þ�� 5 2�1

s2 � Intðs2Þ�� 5 2�1

8><>:

9>=>;

and s1� xmin5 0 and 05 xmax� s2.Under these assumptions, criteria to guide the

selection of ix min and ix max are developed as described

in the following Lemma.

Lemma 1: Under the earlier assumptions, the following

results hold:

ixmin, ixmax 2 Z: ixmin � ixmax

ixmin 4Maxlog xmax

s2

� �log 2

,log xmin

s1

� �log 2

0@

1A

ic � ixmax 4Maxlogð2xmaxÞ

log 2,log 2xminj j

log 2

8>>>>>>><>>>>>>>:

9>>>>>>>=>>>>>>>;

where, ic refers to the scaling parameter that for all i� ic,

�(2�ix) is assumed to be constant.

Proof: From (39) and (42), we have

s1 5 2�ix� j5 s2 ð49Þ

2�ixmin � s2 5 j5 2�ixmax � s1 ð50Þ

where j is the translation index, the interval between

(s1, s2) is the wavelet’s supporting range and the

interval between (xmin, xmax) is the range of the state

variable x.For this 2-D problem, we fix one dimension ( j)

to obtain the criterion for selecting ixmin and ixmax.


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

The determination of translation indices j is then

automatically obtained corresponding to the respective

selection of ixmin and ixmax as in (42).From (50), j is determined by

j 2 Li ¼ 2�ixmin � s2, 2�ixmax � s1

� �, j 2 Z

� ½0�

Because (49) is true for all j2Li, when j¼ 0

s1 5 2�ix5 s2

This indicates

Maxf2�ixg ¼ 2�ixmax 5 s2

Minf2�ixg ¼ 2�ixmin 4 s1

Because {xmax4 0, s24 0} and {xmin5 0, s15 0}, then

i4log xmax

s2

� �log 2

and i4log xmin

s1

� �log 2

Thus,

i4Maxlog xmax

s2

� �log 2

,log xmin

s1

� �log 2

0@

1A

As a result,

ixmin 4Maxlog xmax

s2

� �log 2

,log xmin

s1

� �log 2

0@

1A ð51Þ

Also, from (50), it is observed that

LiM ¼ LiMþ1 ¼ � � � ¼ LiMþ1

if

2�iMxmax 5 2�1 and 2�iMxmin

�� 5 2�1

or

iM 4logð2xmaxÞ

log 2and iM 4

log 2xminj j

log 2

iM 4Maxlogð2xmaxÞ

log 2,log 2xminj j

log 2

Hence, imax can be chosen so that

ixmax 4Maxlogð2xmaxÞ

log 2,log 2xminj j

log 2

ð52Þ

Figure 2. On the selection of scaling parameters.


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

Furthermore, when i increases, �(2�ix) is widelystretched. Therefore, there exists ic2Z so that for alli� ic, �(2�ix) is assumed to be constant. Since there isno benefit of keeping on adding constant terms into theapproximation function library, ic is chosen to be theupper bound for ixmax. Consequently, criteria forselecting ixmin and ixmax are derived as

ixmin, ixmax 2 Z: ixmin � ixmax

ixmin 4Maxlog xmax

s2

� �log 2

,log xmin

s1

� �log 2

0@

1A

ic � ixmax 4Maxlogð2xmaxÞ

log 2,log 2xminj j

log 2

8>>>>>>><>>>>>>>:

9>>>>>>>=>>>>>>>;ð53Þ

œ

The interpretation as well as the application ofthe earlier results in the context of nonlinear systemidentification via 2-DWSDP models will be illustratedin the simulation examples (Section 5), particularlythrough Example 1.

Remark 2: The selection of ixmin and ixmax based on(53) automatically implies the translation indices jbeforehand using (42). It means that a fixed waveletfunction library is deterministically established.

Remark 3: If xmin¼ 0 or xmax¼ 0, the log function asin (53) will be undefined. In such cases, without theloss of generality, we convert xmin to a small negativevalue �� (i.e. �¼ 10�2) which is negative and veryclose to 0, or convert xmax to a small positive value �(i.e. �¼ 10�2) which is positive and very close to 0 tosatisfy the assumptions.

Remark 4: If xmin =2 [s1, 0] or xmax =2 [0, s2], we canalways convert x to (s1, s2) to satisfy the assumptions.

Remark 5: Note that for a Mexican hat wavelet as in(4), ic is determined to be 5.

As a result, the criteria to guide the selection of thescaling parameters imin and imax in the context of a 2-Dwavelet series expansion of f1(x1, x2) can be derived asfollows:

imin ¼Minðix1 min, ix2 minÞ ð54Þ

imax ¼Maxðix1 max, ix2 maxÞ ð55Þ

Applying (53), we obtain:

ix1 min, ix1 max 2 Z: ix1 min � ix1 max

ix1 min 4Maxlog

x1 max

s2

� �log 2

,log

x1 min

s1

� �log 2

0@

1A

ic � ix1 max 4Maxlogð2x1maxÞ

log 2,log 2x1minj j

log 2

8>>>>>>><>>>>>>>:

9>>>>>>>=>>>>>>>;ð56Þ

ix2 min, ix2 max 2 Z: ix2 min � ix2 max

ix2 min 4Maxlog x2max

s2

� �log 2

,log x2min

s1

� �log 2

0@

1A

ic � ix2 max 4Maxlogð2x2maxÞ

log 2,log 2x2minj j

log 2

8>>>>>>><>>>>>>>:

9>>>>>>>=>>>>>>>;ð57Þ

As a consequence,

imin, imax 2 Z: imin � imax

imin 4Min

Maxlog x1max

s2

� �log 2

,log x1min

s1

� �log 2

0@

1A

Maxlog x2max

s2

� �log 2

,log x2min

s1

� �log 2

0@

1A

266666664

377777775

ic � imax 4Maxlogð2x1maxÞ

log 2,log 2x1minj j

log 2,

logð2x2maxÞ

log 2,log 2x2minj j

log 2

8>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>:

9>>>>>>>>>>>>>>>>>>>=>>>>>>>>>>>>>>>>>>>;ð58Þ

This will be illustrated in the simulation examples(Section 5).

4. Model structure determination using PRESS

The 2-DWSDP model as derived in (30) is oftenoverparameterised as it may consist of significantredundancies in the model representation. With theseredundancies, the data matrix is often numericallyill-conditioned, leading to a number of disadvantagesin both computation and efficiency associated with theparameter estimation.

The principle of a model structure determinationalgorithm lies in the selection of a final model structurewhich is simple but adequate to explain the essentialsof the underlying system dynamics. The key here is tojustify the significance of each term within the originaloverparameterised model based on a criterion, anddetermine which term is necessary to be included intothe final model.

An efficient model structure determinationapproach based on the PRESS criterion and forwardregression has been studied in the previous works(Truong et al. 2006, 2007a, 2007b; Truong and Wang2008, 2009). This approach detects the most significantterms in the overparameterised model based on theincremental value of PRESS3 (DPRESS) as criterion todetect the significance of each term within the model inwhich the maximum DPRESS signifies the mostsignificant term, while its minimum reflects the least


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

significant term. Based on this, the algorithm begins

with the initial subset being the most significant term.

It then starts to grow to include the subsequent

significant terms in a forward regression manner until

a specified performance is achieved. Furthermore,

since OD is used in the PRESS computation (Hong

et al. 2003a, 2003b, 2003c; Billings and Wei 2008),

it enables the algorithm to eliminate any numerically

ill-conditioning associated with the parameter estima-

tion. This further enhances the performance and

efficiency of this model structure selection algorithm.Upon determining the optimised nonlinear model

structure for the overparameterised representation as

in (30), the final identified model structure is generally

found to be

yðkÞ ¼Xnyq¼1

Xnfqj¼1

aq, j’½2�q, jðxmq, nqÞ

" #yðk� qÞ

þXnuq¼0

Xngqj¼1

bq, j�½2�q, jðxlq, pq Þ

" #uðk� qÞ þ eðkÞ ð59Þ

Similarly, we can write (59) into the following

matrix form

Y ¼ L� þ� ð60Þ

where

Y ¼ yð0Þ, yð1Þ, . . . , yðN� 1Þ½ �T

U ¼ uð0Þ, uð1Þ, . . . , uðN� 1Þ½ �T

� ¼ eð0Þ, eð1Þ, . . . , eðN� 1Þ½ �T

Aq ¼ aq, 1, aq, 2, . . . , aq, nfq� �

Bq ¼ bq, 1, bq, 2, . . . , bq, ngq� �

� ¼ A1,A2, . . . ,Any ,B0,B1, . . . ,Bnu

� �Tð61Þ

LfqðkÞ ¼ ’½2�q, 1½xmq, nq ðkÞ�, . . . , ’½2�q, nfq ½xmq, nq ðkÞ�h i

yðk� qÞ

LgqðkÞ ¼ �½2�q, 1½xlq, pq ðkÞ�, . . . ,�½2�q, ngq ½xlq, pq ðkÞ�h i

uðk� qÞ

Lk ¼ ½Lf1ðkÞ, . . . ,Lfny ðkÞ,Lg0ðkÞ, . . . ,Lgnu ðkÞ�T

L ¼ L0, . . . ,LN�1½ �T

Now define the cost function

J ¼ Y� L�½ �T Y� L�½ � ð62Þ

and solve for the parameter vector � that minimises J,

� ¼ LTL� ��1

LTY ð63Þ

This estimation is based on least squares approach

which will have optimal statistical properties if e(k) is a

zero mean, normally distributed, white noise process and

independent of the input signal u(k). The consistency of

the parameter estimates will be numerically investi-

gated and discussed through the simulation examples.

However, depending upon the nature of the data, this

assumption might not be applicable. In such a case,

some other estimation approaches might be necessary,

such as an instrumental variable (IV) approach which

can be used for the parameter estimation in this model

setting (Truong and Wang 2008).

4.1. Identification procedure

The overall nonlinear system identification using the

proposed approach can be summarised into the

following steps.

(1) Determining the 2-DSDP model’s initial condi-

tions. This includes the following:

(a) Select the initial values, which normally startwith lower values of ny and nu.

(b) Based on the available a priori knowledge,select the significant variables from all the

candidate lagged output and input terms

(y(k� 1), . . . , y(k� ny), u(k), . . . , u(k� nu))

and the significant 2-D state dependencies

( fq(xmq,nq), gq(xlq,pq)) formulated by the

selected significant variables. Note that the

a priori knowledge can be some known

structural characteristics, or based on some

hypothesis and assumption made about the

system under study.(c) Otherwise, if there is no a priori knowledge

available, all the possible variables as well as

their associated possible 2-D dependencies

for the selected model order (ny and nu) need

to be considered. For example, if ny¼ 1 and

nu¼ 1, the possible variables are y(k� 1),

u(k) and u(k� 1), leading to the possible 2-D

dependencies between: {y(k� 1), u(k)},

{y(k� 1), u(k� 1)} and {u(k), u(k� 1)}.

(2) 2-DWSDP’s optimised model structure selec-

tion. This involves the following steps:

(a) Based on the features of considered dataand the selected wavelet basis function,

determine the associated scaling parameters

[imin, imax] to be used for the 2-DSDP

parameterisation using (58).(b) Formulate an overparameterised 2-DWSDP

model by expanding all the 2-DSDPs

(i.e. fq(xmq,nq), gq(xlq,pq)) via 2-D wavelet

series expansion using the selected scaling

parameters [imin, imax].


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

(c) Using the PRESS-based selection algorithm,determine an optimised model structurefrom the candidate model terms.

(3) Final parametric optimisation.

. Using the measured data, estimate theassociated parameters via a LeastSquares algorithm.

(4) Model validation.

. If the identified values of ny and nu asselected in Step 1 provides a satisfactoryperformance over the considered data,terminate the procedure.

. Otherwise, increase the model’s order,i.e. ny¼ nyþ 1 and/or nu¼ nuþ 1, andrepeat Steps 1b, 2–4.

5. Examples

To demonstrate the merits of the proposed approach,two examples are provided in this section. Forsimplicity, throughout this section, a form of 2-DMexican hat wavelet functions, which are very easy tocalculate with a very small computational load, asdefined in (5), is used.

To facilitate the direct comparison between theestimated and actual nonlinear functions, we first startwith a simulation example. The second example studiesthe identification of a Continuous Stirring TankReactor (CSTR) which is among the most commontype of chemical and petrochemical reactors. In theseexamples, the proposed technique is also compared toa polynomial-based approach.

5.1. Example 1

Consider a nonlinear system described by the followingequation:

yðkÞ ¼ �yðk� 1Þ2uðkÞe�0:5½yðk�1Þ2þuðkÞ2�

þ uðkÞuðk� 1Þ3e�0:5½uðkÞ2þuðk�1Þ2� þ eðkÞ ð64Þ

in which the input signal uðkÞ ¼ sinð k50Þ and e(k) is awhite noise sequence, uniformly distributed within[�0.045, 0.045].

With zero initial conditions, (64) is simulated togenerate 1000 data samples for system identification asshown in Figure 3.

With the assumption that there is no a prioriknowledge available, a first-order 2-DSDP model(ny¼ 1, nu¼ 1) is used for the identification of thesystem. In this situation, the possible variables arey(k� 1), u(k) and u(k� 1). This leads to the possible

2-D dependencies between: {y(k� 1), u(k)}, {y(k� 1),

u(k� 1)} and {u(k), u(k� 1)}. Consequently, the2-DWSDP model structure used for the identificationof this system is in the following form:

yðkÞ ¼ f1 yðk�1Þ,uðkÞ½ �yðk�1Þþg0 yðk�1Þ,uðk�1Þ½ �uðkÞ

þg1 uðkÞ,uðk�1Þ½ �uðk�1Þ ð65Þ

5.1.1. Selection of scaling parameters

Since (65) consists of 3 2-DSDPs: f1[y(k� 1), u(k)],g0[y(k� 1), u(k� 1)] and g1[u(k), u(k� 1)], the scalingparameters for each respective 2-DSDP need to bedetermined.

First, the selection of the scaling parameters usedfor the 2-D wavelet series expansion of f1[y(k� 1), u(k)]is considered. From (53), the following set of inequal-

ities are obtained.

iymin, iymax 2 Z: iymin � iymax

iymin 4Maxlog ymax

s2

� �log 2

,log ymin

s1

� �log 2

0@

1A

ic � iymax 4Maxlogð2ymaxÞ

log 2,log 2ymin

�� log 2

8>>>>>>><>>>>>>>:

9>>>>>>>=>>>>>>>;ð66Þ

With s2¼ 4, s1¼�4, ymax¼max[y(k� 1)]¼ 0.5029,ymin¼min[y(k� 1)]¼��¼�0.01 and particularlyic¼ 5, we obtain

iymin 4Max log0:5029

4

.log 2,

log�10�2

�4

.log 2

¼ �2:99 ð67Þ

ic � iymax 4Max�logð2� 0:5029Þ= log 2,

log �2� 10�2�� = log 2� ¼ 0:48 ð68Þ

As a result,

iymin, iymax 2 Z: iymin � iymax

iymin 4� 2:99

5 ¼ ic � iymax 4 0:48

8><>:

9>=>; ð69Þ

Similarly,

iumin, iumax 2 Z: iumin � iumax

iumin 4Max log 24

� �= log 2, log 2

4

� �= log 2

� �¼ �2

5 � iumax 4Max log 2= log 2, log 2= log 2ð Þ ¼ 1

8><>:

9>=>;ð70Þ

Equalities (69) and (70) lead to

�2 � iymin � iymax

1 � iymax � 5

� �ð71Þ


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

and�1 � iumin � iumax

2 � iumax � 5

� �ð72Þ

Inequalities (71) and (72) define the bounds forif1min¼Min(iymin, iumin) and if1max¼Max(iymax,iumax). The selection of if1min and if1max needs to bestrictly within these bounds. That is, because

. If if1max is greater than its upper bound, itmeans that a large number of unnecessaryconstant terms are added into the functionlibrary for the approximation of f1[y(k� 1),u(k)].

. If if1min is smaller than its lower bound, itmeans that the function library is added with alarger number of unnecessary high-frequencywavelet terms. For example, in this example ifwe choose iymin¼ iu min¼�3, this results in anextra unnecessary 231 2-D wavelet termsadded into the function library for theapproximation of f1[y(k� 1), u(k)]. In thisexample, as the model structure consists of 32-DSDPs, it means that about 693 unneces-sary, extra model candidate terms are added,leading to a significant growth in the over-parameterised model. This directly concernsthe accuracy and efficiency of the modelstructure selection algorithm.

Using (71) and (72), let us choose iymin¼ iumin¼�1and iymax¼ iumax¼ 2, then the finest and coarsestscaling factors used for the 2-D wavelet seriesexpansion of f1[y(k� 1), u(k)] are chosen to beif1min¼Min(iymin, iumin)¼�1 and if1max¼Max(iymax,iumax)¼ 2.

Similarly, for the expansion of g0[y(k� 1), u(k� 1)]and g1[u(k), u(k� 1)], [ig0min, ig0max]¼ [ig1min, ig1max] isselected to be [�1, 2]. As a result, the overall finest andcoarsest scaling factors used for the identification ofthis system are selected to be [imin, imax]¼ [�1, 2]. Usingthis information, the expansion of all the 2-DSDPsresults in a total of 432 model’s candidate terms.

5.1.2. Identification results

Using the PRESS-based selection algorithm to choosethe significant model terms, the final identified modelis found to be

yðkÞ ¼ 0:3727�½2�0, 1,�1ðx1, x2Þh i

fyðk�1Þ, uðkÞgyðk� 1Þ

þ

1:0097�½2�1;1;0ðx1; x2Þ

þ0:2585�½2�0;1;�1ðx1; x2Þ

þ0:4769�½2�0;�1;0ðx1; x2Þ

�0:0076�½2�1;0;0ðx1; x2Þ

2666664

3777775fuðkÞ;uðk�1Þg

uðkÞ ð73Þ

−0.1

0

0.1

0.2

0.3

0.4

0.5(a)

0 100 200 300 400 500 600 700 800 900 1000

0 100 200 300 400 500 600 700 800 900 1000

−1

−0.5

0

0.5

1(b)

Sampling index

Figure 3. Example 1 data: (a) output (b) input.


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

in which

�½2�i, j1, j2ðx1, x2Þh i

fcðkÞ, dðkÞg¼ �½2�i, j1, j2½cðkÞ, dðkÞ� ð74Þ


�½2�ðx1, x2Þ ¼ ð1� x21Þð1� x22Þe�0:5ðx2

1þx2

2Þ ð76Þ

Table 1 shows the incremental values of DPRESSk

that are resulted from excluding the associated terms

from the model. As discussed earlier, this value

reflects the significance of each term towards the

model’s parameterisation. The most significant term

corresponds to the maximum DPRESSk (5.7681), and

this is ranked 1 as shown in Table 1. The least

significant term is reflected by the minimum DPRESSk

(0.022), and this is ranked 5 in Table 1.To validate the identified model (73), we generate

a new data set k¼ 1001–2000. Figure 4 shows the

comparison between the model’s iterative (simulated)

output4 and the actual noise-free output over the

validation data set, as well as their associated residual.

They are almost identical. Figure 5 compares the

estimated 2-DSDPs ( f1ðx1, x2Þ and g1(x1, x2)) to the

actual functions ( f1(x1, x2) and g1(x1, x2)), which are

very well matched to each other. They, in turn, imply

that the identified 5-term model (Equation (73))

excellently characterises this system, in the sense that

the actual system’s dynamics are efficiently captured.To further investigate the consistency property of

the proposed approach in this particular example,

a Monte Carlo simulation, which consists of 100

independent tests, has been implemented. In this test,

the realisation of the noise is varied by changing the

‘seed’ element of the random noise generator from 0

to 99. In each independent test, a set of input–output

data is generated by simulating (64), but with varied

noise sequence. The results are tabulated in Table 2,

which demonstrates that the parameter estimates

1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

0

0.1

0.2

0.3

0.4

0.5(a)

0 100 200 300 400 500 600 700 800 900 1000−6

−4

−2

0

2

4

6x 10−3

(b)

Sampling index

Figure 4. Example 1: (a) comparison between the actual output (solid) and model iterative output of (73) (dot-dot) over thevalidation set and (b) their associated residual.

Table 1. Example 1: DPRESS table.

Termindex k Model’s term DPRESSk Rank

1 �½2�0, 1,�1½yðk� 1Þ, uðkÞ�yðk� 1Þ 0.1155 4

2 �½2�1, 1, 0½uðkÞ, uðk� 1Þ�uðk� 1Þ 5.7681 1

3 �½2�0, 1,�1½uðkÞ, uðk� 1Þ�uðk� 1Þ 0.1383 3

4 �½2�0,�1, 0½uðkÞ, uðk� 1Þ�uðk� 1Þ 0.8027 2

5 �½2�1, 0, 0½uðkÞ, uðk� 1Þ�uðk� 1Þ 0.022 5


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

obtained in this example are quite consistent and very

close to the noise free estimates.

5.1.3. In comparison to a polynomial-based approach

For comparison, a polynomial-based approach is used

to parameterise the respective 2-DSDP relationship.Using this approach, the above system is identified as

yðkÞ ¼0:6578x1x

42 þ 0:0781x22

�0:3156x42 þ 0:6541

" #fyðk�1Þ, uðkÞg

yðk� 1Þ

þ0:9113x41x

32 þ 0:2526x31

�1:0414x31x42 � 0:0084

" #fuðkÞ, uðk�1Þg

uðk� 1Þ

ð77Þ

To facilitate the comparison between the proposedmethod and this polynomial-based approach, a mea-surement index, absolute error, is used:

Ek ¼ yðkÞ � yðkÞ�� ð78Þ

in which y(k) and y(k) correspond the actual noise freedata and the model’s simulated output over the testingdata set.

Let us denote Ewaveletk and E

polyk to be the absolute

errors resulted from using (73) and (77), respectively.They are then compared in Figure 6, in whichEwaveletk (shown in solid line) is much smaller than

Epolyk (shown in dot-dot line). This implies that in this

example, the proposed approach is more advantageousthan the considered polynomial-based approach.

Figure 7 compares the estimated functions usingthe polynomial approach f

poly1 ðx1, x2Þ and g

poly1 ðx1, x2Þ

versus the actual functions f1(x1, x2) and g1(x1, x2). Thegap between f

poly1 ðx1, x2Þ and f1(x1, x2) as shown in

Figure 7(a) indicates significant bias in the parameterestimates for the polynomial-based model (77) underthe considered noise level.

Another disadvantage of this polynomial appro-ach is demonstrated in Figure 7(b), in which g

poly1 ðx1, x2Þ

exhibits significant oscillatory and overshoot beha-viour. This is a limitation of high-order polynomials inapproximating complicated functions like the onesconsidered in this example. By contrast, the proposedwavelet-based approach has provided very well local-ised solutions which closely approximate the actual 2-Ddependencies (Figure 5). That is due to the excellent

0

0.2

0.4

−1

−0.5

0

0.5

1−0.2

−0.1

0

0.1

0.2

0.3

(a)

−1

−0.5

0

0.5

1

−1−0.5

00.5

1−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

(b)

Figure 5. Example 1: (a) f1(x1, x2) (solid) vs. f1ðx1, x2Þ (dot-dash) and (b) g1(x1, x2) (solid) vs. g1(x1, x2) (dot-dash).

Table 2. Example 1: Monte Carlo test.

TermIndex k

Noise-freeestimate

Noise disturbedestimate

1 0.3667 0.3657 0.02372 1.0115 1.0116 0.01033 0.2663 0.2665 0.01654 0.4843 0.4830 0.01325 �0.0041 �0.0042 0.0062


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

localisation properties of wavelet basis functions.

Additionally, the bounded characteristics of wavelet

basis functions can be very useful for the stability

analysis of the identified models using the proposed

approach.

5.2. Example 2

The three most commonly used reactors are: batch or

semi-batch (BR), CSTR, and tubular or plug flow (PR)

reactors. In this example, the identification of a CSTR

reactor is under study.

1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 20000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Sampling index

Figure 6. Example 1: Ewaveletk (solid) vs. Epoly

k (dot-dot).

−1

0

1

−1−0.5

00.5

1−1.5

−1

0

1

(b)

00.2

0.4

−1

0

1−0.4

0.5

1.5

−0.5

−0.2

0

0.2

0.4

0.6

0.8

1

(a)

Figure 7. Example 1: (a) f1(x1, x2) (solid) vs. fpoly1 ðx1, x2Þ (dot-dash) and (b) g1(x1, x2) (solid) vs. g

poly1 ðx1, x2Þ (dot-dash).


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

The CSTR is the most commonly used type ofchemical and petrochemical plants. It consists of atank, stirring mechanism and feed pumps (Figure 8).Within a CSTR, two chemicals are mixed and reactedto produce a product compound at a concentration ofCa(t) and a mixture temperature at T(t). This reactionis irreversible and exothermic, occurring in a constantvolume reactor that is cooled by a single coolantstream at a flow rate of qc(t). This coolant stream flowrate varies the heat produced from the reaction, andthus influences the product concentration. This processis highly nonlinear and its mathematical model is givenas a set of differential equations:

_CaðtÞ ¼q

�Ca0 � CaðtÞ½ � � k0CaðtÞe

�E=RTðtÞ

_TðtÞ ¼q

�T0 � TðtÞ½ � þ k1CaðtÞe

�E=RTðtÞ

þ k2qcðtÞ 1� e�k3=qcðtÞ� �

Tc0 � TðtÞ½ � ð79Þ

In which Ca0 is the inlet feed concentration; qdenotes the process flow rate and T0 and Tc0 representthe inlet feed and coolant temperature, respectively.These parameters are assumed to be constant attheir nominal values. E

R , �, k1 ¼ �DHk0=�Cp, k2 ¼�cCpc=�Cp� and k3¼ ha/�cCpc are thermodynamic andchemical constants relating to this particular problem.The nominal values for this plant are given in Table 3.

This system set-up has been studied in Lightbodyand Irwin (1997) where a neural network was used tomodel the plant as a third-order nonlinear model, i.e.

CaðkÞ ¼ f ½Caðk� 1Þ,Caðk� 2Þ,Caðk� 3Þ,

qcðk� 1Þ, qcðk� 2Þ, qcðk� 3Þ� ð80Þ

In this example, using the same system set-up, wedemonstrate that the plant dynamics can be excellentlycaptured and represented in a compact manner using asimpler first-order 2-DWSDP model. This illustrates theeffectiveness and advantages of the developed approach.

5.2.1. Identification results

In this study, the identification data is obtained bysimulating (79) using the nominal values as tabulated inTable 3. With the input qc(k) set to be varied betweenqcmin¼ 90L/min and qcmax¼ 111L/min (Figure 9b),and the sampling interval to be Dt¼ 0.1min, 750minworth of simulated data (7500 samples) is obtainedas shown in Figure 9 (which can also be obtained fromDe Moor (2007). These input and output signals{qc(k),Ca(k)} are then, for the ease of the systemidentification, standardised and still designated as{qc(k),Ca(k)}(i.e. qc ¼

qc�MeanðqcÞStdðqcÞ

and Ca ¼Ca�MeanðCaÞ

StdðCaÞ).

The 7500 data points were divided into two sets: theestimation set consisting of the first 6000 data points

and the validation set consisting of the remaining 1500

data points for model testing.Using a first-order 2-DWSDPmodel for this system,

with the finest and coarsest scaling parameters chosen

to be�1 and 3, the final identified model is found to be:

CaðkÞ¼

0:8685�½2�3,0,0ðx1,x2Þ

þ0:5903�½2�2,1,1ðx1,x2Þ

þ0:0117�½2�0,4,2ðx1,x2Þ

þ0:2622�½2�1,1,�1ðx1,x2Þ

þ0:0568�½2��1,9,5ðx1,x2Þ

266666666664

377777777775fCaðk�1Þ,qcðk�1Þg

Caðk�1Þ

þ

0:1482�½2�3,0,0ðx1,x2Þ

þ0:2241�½2�2,0,�1ðx1,x2Þ

þ0:1092�½2�1,0,1ðx1,x2Þ

þ0:0488�½2�0,2,1ðx1,x2Þ

þ0:0931�½2�1,1,1ðx1,x2Þ

266666666664


qcðk�1Þ

ð81Þ

Figure 8. A CSTR’s schematic representation.

Table 3. CSTR parameters.

Parameters Description Nominal values

q Process flowrate 100L/min� Reactor volume 100Lk0 Reaction rate constant 7.2�1010min�1ER Activation energy 1�104KT0 Feed temperature 350KTc0 Inlet coolant temperature 350KDH Heat of reaction �2�105 cal/molCp,Cpc Specific heats 1 cal/g/K

�, �c Liquid densities 1�103 g/Lha Heat transfer coefficients 7�105 cal/min/K


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

where

�½2�i, j1, j2ðx1, x2Þh i

fcðkÞ, dðkÞg¼ �½2�i, j1, j2½cðkÞ, dðkÞ� ð82Þ


�½2�ðx1, x2Þ ¼ ð1� x21Þð1� x22Þe�0:5ðx2

1þx2

2Þ ð84Þ

Figure 10(a) compares the predicted output of themodel (Equation (81)), which is recovered to its originalamplitude by de-standardisation, versus the actualoutput over the estimation set; and their associatedresidual is shown in Figure 10(b). Figure 11 comparesthe model’s iterative (simulated) output to the actualoutput signal over the whole data set. This demon-strates that this identified first-order 2-DWSDP model(10 terms) excellently characterises the dynamic beha-viour of this CSTR. The simplicity and effectiveness ofthis model make it attractive for its future applications,such as the design of a nonlinear control system.

5.2.2. In comparison to a polynomial-based approach

As in the previous example, we provide a comparisonbetween the proposed approach and a polynomial-based approach in which a polynomial function is used

to parameterise the respective 2-DSDP relationship(Figure 12). Using this polynomial-based approach,the above-mentioned CSTR system is identified as

CaðkÞ¼0:0001x41x

42�0:0007x

41x

22

þ0:0025x41þ0:8325

" #fCaðk�1Þ,qcðk�1Þg

Caðk�1Þ

þ

�0:0001x41x42þ0:0004x

21x

42

þ0:0001x31x42�0:0005x

41x

22

�0:0007x42þ0:1674

2664


qcðk�1Þ

ð85Þ

To facilitate the comparison and measurementindex and mean-squared-error (MSE) is used tomeasure the performance of the identified models.This index is defined as

MSE ¼

PNtest

k¼1 yðkÞ � yðkÞ�� 2PNtest

k¼1 yðkÞ � �y�� 2

" #ð86Þ

in which y and y correspond to the actual measurementand the model’s simulated output on the testing set and�y ¼ ð1=NtestÞ

PNtest

k¼1 yðkÞ.The MSE of (85) calculated over the validation set

(points from 6001 to 7500) is 0.0473. This value islarger than the MSE value calculated for (81) with

respect to the same testing data set which is 0.0325.

0 1000 2000 3000 4000 5000 6000 7000

0 1000 2000 3000 4000 5000 6000 7000

0.06

0.08

0.1

0.12

0.14

0.16(a)

85

90

95

100

105

110

115(b)

Sampling index

Figure 9. CSTR data: (a) output Ca(k) and (b) input qc(k).


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

0 1000 2000 3000 4000 5000 6000 70000.06

0.08

0.1

0.12

0.14

(a)

0 1000 2000 3000 4000 5000 6000 7000–0.02

–0.01

0

0.01

0.02

(b)

0 500 1000 15000.06

0.08

0.1

0.12

0.14(c)

Sampling index

Figure 11. CSTR: (a) Comparison between the actual output (solid) and model iterative output of (81) (dash-dash) over thewhole data, (b) their associated residual and (c) a zoom-in view over the validation set (the last 1500 samples).

0 1000 2000 3000 4000 5000 6000

0 1000 2000 3000 4000 5000 6000

0.06

0.08

0.1

0.12

0.14

0.16(a)

−0.01

−0.005

0

0.005

0.01(b)

Sampling index

Figure 10. CSTR: (a) Comparison between the actual output (solid) and model (81) prediction (dot-dot) over the estimation set,and (b) their associated residual.


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

This implies that for this example, the proposedapproach may be advantageous over the polynomial-based approach.

6. Conclusions

A new class of SDP models called 2-DWSDP modelhas been presented in this article for nonlinear systemidentification. Using this approach, multi-dimensionalstate dependency has been developed, providing analternative extension to the existing SDP modellingapproach which is single state dependency based. Inaddition, the associated nonlinear model structureselection problem is systematically solved by firstchoosing a set of candidate model structures basedon the characteristics of the wavelet, then exploitingthe PRESS criterion and forward regression in con-junction with OD to yield a more parsimoniousnonlinear system model. The parameter estimationprocedure also automatically eliminates the termsassociated with the ill-conditioning problem in thealgorithm.

The contribution of this article can be summarisedas follows:

(1) To the best of our knowledge, the reportedresults are one of the first on a systematicdevelopment of 2-DSDP models for nonlinear

system identification. The advantage of this

model structure over the existing SDP model

structure is that it takes into account the

interactions between various model’s output/

input terms. Together with its relative simpli-

city, this makes the proposed approach more

practical and very useful for a wide range of

engineering applications.(2) The proposed 2-DWSDP model structure is

inherently stochastic. Therefore, the uncer-

tainty associated with the parameter estimates

is taken into account in the identification

methodology. This is often very useful for its

practical applications. As demonstrated in the

simulation examples, the proposed approach

works very well in the presence of a substantial

amount of noise despite the bias in the

parameter estimates. This bias is dependent

upon the signal-to-noise ratio of the system.

Nevertheless, the extension of this work to

obtain an unbiased parameter estimation using

IV for consistent parameter estimates in the

presence of high level of noise is underway and

is to be reported in the future.

Through the simulation examples, the merits of the

proposed approach have been illustrated. Particularly,

a relative comparison to a polynomial-based approach

−5

0

5

0

5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a)

−5

0

5

0

50

0.05

0.1

0.15

0.2

0.25

0.3

(b)

Figure 12. Example 2: 2-DSDP plots: (a) f1ðx1, x2Þ and (b) g1(x1, x2).


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

was also provided, demonstrating the advantages ofthe developed technique.

Notes

1. The mother wavelet has nonzero values within thisrange. Outside this range, it has zero or insignificantvalues which are assumed to be zero.

2. A small value of imin results in a large number of waveletelements with higher frequency characteristics to becontained in the function’s library. And vice versa, witha large value of imax, the function’s library will consist ofa large number of wavelet elements that are at lowerfrequency features.

3. The difference between the overparameterised (original)model’s PRESS value and the one calculated byexcluding a term from the original model.

4. That is, the output obtained by generating the determi-nistic model output from the model input alone, withoutany reference to the output measurements.

Notes on contributors

Liuping Wang received her PhD in1989 from the University of Sheffield,UK; subsequently, she was an adjunctassociate professor in the Departmentof Chemical Engineering at theUniversity of Toronto, Canada.From 1998 to 2002 she was a seniorlecturer and research coordinator inthe Center for Integrated Dynamics

and Control, University of Newcastle, Australia beforejoining RMIT University where she is a professor andHead of Discipline of Electrical Engineering. She is theauthor of two books, joint editor of one book and haspublished over 130 papers. L. Wang has been activelyengaged in industry-oriented research and development sincethe completion of her PhD studies. While working at theUniversity of Toronto, Canada, she was the co-founder of anIndustry Consortium for identification of chemical processes.Since her arrival at Australia in 1998, she has been workingwith Australian government organisations and companies inthe areas of food manufacturing, mining, automotive andpower services. She leads the Control Systems program in theAustralian Advanced Manufacturing Cooperative ResearchCenter (AMCRC) that develops next generation technologyplatforms for the manufacturing industry. She is in the boardof directors of Australian Power Academy that promotespower engineering education and raises scholarships frompower industry to support undergraduate students.

Nguyen-Vu Truong received the BEng(honours) degree and the PhD degreeboth in control engineering fromPetronas University of Technology(Malaysia) and RMIT University(Australia), respectively. Since 2008,he has been working as a researchfellow at the school of electricaland computer engineering, RMIT

University. His research interests are nonlinear systemidentification, wavelet theory and its applications.

References

Baudat, G., and Anouar, F. (2001), ‘Kernel-based Methods

and Function Approximation’, in Proceedings of the 2001

International Joint Conference on Neural Networks,

Washington, DC, USA, pp. 1244–1249.

Billings, S.A., Chen, S., and Korenberg, M.J. (1989),

‘Identification of Nonlinear MIMO Systems Using

Forward Regression Orthogonal Estimator’, International

Journal of Control, 49, 2157–2189.Billings, S.A, and Wei, H.L. (2005), ‘A New Class of Wavelet

Networks for Nonlinear System Identification’, IEEE

Transactions on Neural Networks, 16(4), 862–874.

Billings, S.A., and Wei, H.L. (2008), ‘An Adaptive

Orthogonal Search Algorithm for Model Subset Selection

and Non-linear System Identification’, International

Journal of Control, 81(5), 714–724.

Chen, S., Billings, S.A., and Luo, W. (1989), ‘Orthogonal

Least Squares Methods and Their Applications in

Nonlinear System Identification’, International Journal of

Control, 50, 1873–1896.Chui, K.C. (1992), An Introduction to Wavelets, New York:

Academics.De Moor, B.L.R. (ed.) (2007), DaIsy: Database for

Identification of Systems, Department of Electrical

Engineering, ESAT/SISTA, K.U.Leuven, Belgium. http://

homes.esat.kuleuven.be/~smc/daisy/ [Continuous stirred

tank reactor, Process Industry Systems, 98-002].Gonzalez, J., Rojas, I., Ortega, J., Pomares, H., Fernandez,

F.J., and Diaz, A.F. (2003), ‘Multiobjective Evolutionary

Optimization of the Size, Shape and Position Parameters

of Radial Basis Function Networks for Functional

Approximation’, IEEE Transactions on Neural Networks,

14(6), 1478–1495.Hong, X., Harris, C.J., Chen, S., and Sharkey, P.M. (2003a),

‘Robust Nonlinear System Identification Methods Using

Forward Regression’, IEEE Transactions on Systems,

Man and Cybernetics-Part A: Systems and Humans, 33(4),

514–523.Hong, X., Sharkey, P.M., and Warwick, K. (2003b),

‘Automatic Nonlinear Predictive Model Construction

Algorithm Using Forward Regression and the PRESS

Statistic’, IEE Proceedings: Control Theory and

Applications, 150(3), 245–254.Hong, X., Sharkey, P.M., and Warwick, K. (2003c), ‘A

Robust Nonlinear Identification Algorithm Using PRESS

Statistic and Forward Regression’, IEEE Transactions on

Neural Networks, 14(2), 454–458.

Lightbody, G., and Irwin, G.W. (1997), ‘Nonlinear

Control Structures Based on Embedded Neural System

Models’, IEEE Transactions on Neural Networks, 8(3),

553–567.

Liu, G.P., Billings, S.A., and Kadirkamanathan, V. (1998),

‘Nonlinear System Identification Using Wavelet Network’,

in Proceedings of the UKACC International Conference on

Control, pp. 1248–1253.Mertin, A. (1999), Signal Analysis: Wavelet, Filter Banks,

Time-Frequency Transforms and Applications, London:

John Wiley & Sons.


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

Meyer, Y. (1992), Wavelet and Operator, Cambridge:Cambridge University Press.

Savakis, A.E., Stoughton, J.W., and Kanetkar, S.V. (1989),‘Spline Function Approximation for Velocimeter DopplerFrequency Measurement’, IEEE Transactions onInstrumentation and Measurement, 8(4), 892–897.

Truong, N.V., and Wang, L. (2008), ‘Nonlinear SystemIdentification in a Noisy Environment Using WaveletBased SDP Models’, in Proceedings of the 17th IFAC

World Congress, Seoul, S. Korea, pp. 7439–7444.Truong, N.V., and Wang, L. (2009), ‘Benchmark NonlinearSystem Using Wavelet Based SDP Models’, in Proceedings

of the 17th IFAC Symposium on System Identification(SYSID 2009), Saint-Malo, France.

Truong, N.V., Wang, L., and Huang, J.M. (2007a),‘Nonlinear Modeling of a Magnetic Bearing Using

SDP Model and Linear Wavelet Parameterization’,in Proceedings of the 2007 American Control Conference,New York, USA, pp. 2254–2259.

Truong, N.V., Wang, L., and Young, P.C. (2006), ‘NonlinearSystem Modeling Based on Nonparametric Identificationand Linear Wavelet Estimation of SDP Models’,

in Proceedings of the 45th IEEE Conference on Decisionand Control, San Diego, USA, pp. 2523–2528.

Truong, N.V., Wang, L., and Young, P.C. (2007b),‘Nonlinear System Modeling Based on NonparametricIdentification and Linear Wavelet Estimation ofSDP Models’, International Journal of Control, 80(5),

774–788.Young, P.C. (1993), ‘Time Variable and State DependentModelling of Nonstationary and Nonlinear Time Series’,

in Developments in Time Series, Volume in Honour ofMaurice Priestley, ed. T.S. Rao, London: Chapman andHall, pp. 374–413.

Young, P.C. (1998), ‘Data-based Mechanistic Modelling ofEngineering Systems’, Journal in Vibration and Control, 4,5–28.

Young, P.C. (2001), ‘The Identification and Estimation of

Nonlinear Stochastic Systems’, in Nonlinear Dynamics andStatistics, ed. A.I. Mees, Boston: Birkhauser, pp. 127–166.

Young, P.C., McKenna, P., and Bruun, J. (2001),

‘Identification of Nonlinear Stochastic Systems by StateDependent Parameter Estimation’, International Journal ofControl, 74, 1837–1857.


Dow

nloa

ded

by [

Um

eå U

nive

rsity

Lib

rary

] at

23:

33 2

2 Se

ptem

ber

2013

nonlinear system identification using two-dimensional wavelet-based state-dependent parameter models

Documents