identification of nonlinear systems using polynomial ... · 2.4.1 response to a sine wave 25 2.4.2...

Identification of Nonlinear Systems using Polynomial Nonlinear State Space Models

FACULTY OF ENGINEERINGDepartment of Fundamental Electricity and Instrumentation

Thesis submitted in fulfillment of the requirements for the degree of Doctor in de Ingenieurswetenschappen (Doctor in Engineering) by

ir. Johan Paduart

Chair: Prof. Dr. ir. Annick Hubin (Vrije Universiteit Brussel)

Vice chair: Prof. Dr. ir. Jean Vereecken (Vrije Universiteit Brussel)

Secretary: Prof. Dr. Steve Vanlanduit (Vrije Universiteit Brussel)

Advisers: Prof. Dr. ir. Johan Schoukens (Vrije Universiteit Brussel)Prof. Dr. ir. Rik Pintelon (Vrije Universiteit Brussel)

Jury: Prof. Dr. ir. Lennart Ljung (Linköping University)Prof. Dr. ir. Johan Suykens (Katholieke Universiteit Leuven)Prof. Dr. ir. Jan Swevers (Katholieke Universiteit Leuven)Prof. Dr. ir. Yves Rolain (Vrije Universiteit Brussel)

Print: Grafikon, Oostkamp

© Vrije Universiteit Brussel - ELEC Department© Johan Paduart

2007 Uitgeverij VUBPRESS Brussels University PressVUBPRESS is an imprint of ASP nv (Academic and Scientific Publishers nv)Ravensteingalerij 28B-1000 BrusselsTel. ++32 (0)2 289 26 50Fax ++32 (0)2 289 26 59E-mail: [email protected]

ISBN 978 90 5487 468 3NUR 910Legal deposit D/2008/11.161/012

All rights reserved. No parts of this book may be reproduced or transmitted in any formor by any means, electronic, mechanical, photocopying, recording, or otherwise, withoutthe prior written permission of the author or the ELEC Department of the VrijeUniversiteit Brussel.

Leaving the calm water of the linear sea

for the more unpredictable waves of the nonlinear ocean.

Acknowledgements

Research and writing a PhD thesis is not something one can do all by himself. Therefore, I

would like to thank everyone who contributed to this thesis in one or many ways.

First of all, I am grateful to Johan Schoukens and Rik Pintelon for giving me the opportunity to

pursue my PhD degree at the ELEC department, and for introducing me to this fascinating

research field. Without the guidance of the ‘patjes’ (Jojo, Rik and Yves) over the past four

years, this work would have been a much tougher job.

I would also like to thank all my colleagues at the ELEC department for the stimulating work

environment, the exchange of interesting ideas, and the relaxing talks.

I am indebted to Lieve Lauwers, Rik Pintelon, Johan Schoukens, and Wendy Van Moer for

proofreading (parts of) my thesis. A special word of thanks goes to Lieve, who has polished

the rough edges of this text. Lieve, you have made my thesis much more pleasant to read.

Furthermore, I thank Tom Coen, Thomas Delwiche, Liesbeth Gommé, and Kris Smolders for

letting me use their measurements. I very much appreciate your difficult and time-consuming

experimental work.

Last but not least, I am grateful to my family and my lieve Lieve for their unconditional

support and love.

Johan Paduart

i

Table of Contents

Operators and Notational Conventions vSymbols viiAbbreviations ix

Chapter 1Introduction 1

1.1 What are Nonlinear Systems? 21.2 Why build Nonlinear Models? 31.3 A Framework for Nonlinear Modelling 4

1.3.1 Approximation Criteria 41.3.2 The Volterra-Wiener Theory 41.3.3 Continuous-time versus Discrete-time 71.3.4 Single Input, Single Output versus Multiple Input, Multiple Output 71.3.5 What is not included in the Volterra Framework? 8

1.4 Outline of the thesis 101.5 Contributions 121.6 Publication List 13

Chapter 2The Best Linear Approximation 15

2.1 Introduction 162.2 Class of Excitation Signals 18

2.2.1 Random Phase Multisine 182.2.2 Gaussian Noise 20

2.3 Properties of the Best Linear Approximation 212.3.1 Single Input, Single Output Systems 212.3.2 Multiple Input, Multiple Output Systems 22

2.4 Some Properties of Nonlinear Systems 252.4.1 Response to a Sine Wave 252.4.2 Even and Odd Nonlinear Behaviour 252.4.3 The Multisine as a Detection Tool for Nonlinearities 26

2.5 Estimating the Best Linear Approximation 282.5.1 Single Input, Single Output Systems 282.5.2 Multiple Input, Multiple Output Systems 34

Appendix 2.A Calculation of the FRF Covariance from the Input/Output Covariances 40Appendix 2.B Covariance of the FRF for Non Periodic Data 43

Chapter 3Fast Measurement of Quantization Distortions 47

3.1 Introduction 483.2 The Multisine as a Detection Tool for Non-idealities 493.3 DSP Errors 50

3.3.1 Truncation Errors of the Filter Coefficients 51

ii

3.3.2 Finite Precision Distortion 533.3.3 Finite Range Distortion 553.3.4 Influence of the Implementation 57

3.4 Quality Analysis of Audio Codecs 593.5 Conclusion 62

Chapter 4Identification of Nonlinear Feedback Systems 63

4.1 Introduction 644.2 Model Structure 654.3 Estimation Procedure 68

4.3.1 Best Linear Approximation 684.3.2 Nonlinear Feedback 684.3.3 Nonlinear Optimization 70

4.4 Experimental Results 724.4.1 Linear Model 734.4.2 Estimation of the Nonlinear Feedback Coefficients 754.4.3 Nonlinear Optimization 764.4.4 Upsampling 76

4.5 Conclusion 79Appendix 4.A Analytic Expressions for the Jacobian 80

Chapter 5Nonlinear State Space Modelling of Multivariable Systems 81

5.1 Introduction 825.2 The Quest for a Good Model Structure 83

5.2.1 Volterra Models 835.2.2 NARX Approach 845.2.3 State Space Models 86

5.3 Polynomial Nonlinear State Space Models 925.3.1 Multinomial Expansion Theorem 925.3.2 Graded Lexicographic Order 945.3.3 Approximation Behaviour 945.3.4 Stability 985.3.5 Some Remarks on the Polynomial Approach 99

5.4 On the Equivalence with some Block-oriented Models 1025.4.1 Hammerstein 1025.4.2 Wiener 1045.4.3 Wiener-Hammerstein 1055.4.4 Nonlinear Feedback 1065.4.5 Conclusion 109

5.5 A Step beyond the Volterra Framework 1115.5.1 Duffing Oscillator 1115.5.2 Lorenz Attractor 113

5.6 Identification of the PNLSS Model 1155.6.1 Best Linear Approximation 1155.6.2 Frequency Domain Subspace Identification 1155.6.3 Nonlinear Optimization of the Linear Model 1205.6.4 Estimation of the Full Nonlinear Model 122

Appendix 5.A Some Combinatorials 130

iii

Appendix 5.B Construction of the Subspace Weighting Matrix from the FRF Covariance 131Appendix 5.C Nonlinear Optimization Methods 133Appendix 5.D Explicit Expressions for the PNLSS Jacobian 140Appendix 5.E Computation of the Jacobian regarded as an alternative PNLSS system 142

Chapter 6Applications of the Polynomial Nonlinear State Space Model 145

6.1 Silverbox 1466.1.1 Description of the DUT 1466.1.2 Description of the Experiments 1466.1.3 Best Linear Approximation 1476.1.4 Nonlinear Model 1496.1.5 Comparison with Other Approaches 151

6.2 Combine Harvester 1556.2.1 Description of the DUT 1556.2.2 Description of the Experiments 1566.2.3 Best Linear Approximation 1566.2.4 Nonlinear Model 158

6.3 Semi-active Damper 1616.3.1 Description of the DUT 1616.3.2 Description of the Experiments 1616.3.3 Best Linear Approximation 1626.3.4 Nonlinear Model 164

6.4 Quarter Car Set-up 1676.4.1 Description of the DUT 1676.4.2 Description of the Experiments 1676.4.3 Best Linear Approximation 1696.4.4 Nonlinear Model 170

6.5 Robot Arm 1736.5.1 Description of the DUT 1736.5.2 Description of the Experiments 1746.5.3 Best Linear Approximation 1756.5.4 Nonlinear Model 176

6.6 Wiener-Hammerstein 1796.6.1 Description of the DUT 1796.6.2 Description of the Experiments 1796.6.3 Level of Nonlinear Distortions 1796.6.4 Best Linear Approximation 1806.6.5 Nonlinear Model 1826.6.6 Comparison with a Block-oriented Approach 185

6.7 Crystal Detector 1866.7.1 Description of the DUT 1866.7.2 Description of the Experiments 1866.7.3 Best Linear Approximation 1876.7.4 Nonlinear Model 1886.7.5 Comparison with a Block-oriented Approach 192

iv

Chapter 7Conclusions 193

References 197

Publication List 204

v

Operators and Notational Conventions

outline upper case font denotes a set: for example, , , ,

and are, respectively, the natural, the integer, the real, andthe complex numbers.

the Kronecker matrix product

real part of

imaginary part of

the minimizing argument of

an arbitrary function with the property

estimated value of

complex conjugate of

subscript

superscript

subscript with respect to the input of the system

subscript with respect to the output of the system

vector which contains all the distinct nonlinear combinations of

the elements of vector , of exactly degree

vector which contains all the distinct nonlinear combinations of

the elements of vector , from degree 2 up to

superscript matrix transpose

superscript transpose of the inverse matrix

superscript Hermitian transpose: complex conjugate transpose of a matrix

superscript Hermitian transpose of the inverse matrix

superscript Moore-Penrose pseudo-inverse

phase (argument) of the complex number

-th column of

-th row of

condition number of an matrix

magnitude of a complex number

� � � �

�

⊗Re ( )

Im ( )

f x( )x

arg min f x( )

O x( ) O x( )x

-----------x 0→lim ∞<

θ θx x

re AreRe A( )

Im A( )=

re AreRe A( ) Im A( )=

u

y

x r( )

x r

x r{ }

x r

T

T–

H

H–

+

x∠ x

A : j,[ ] j A

A i :,[ ] i A

κ A( ) σi A( )i

max( ) σi A( )i

min( )⁄= n m× A

x Re x( )( )2 Im x( )( )2+= x

vi

block diagonal matrix with blocks with

Hermitian symmetric part of an matrix

rank of the matrix , i.e., maximum number of linear

independent rows (columns) of

a column vector formed by stacking the columns of the matrix

on top of each other

mathematical expectation

cross-covariance matrix of and

variance of

covariance matrix of

sample covariance matrix of

cross-covariance matrix of and

sample cross-covariance matrix of and

Discrete Fourier Transform of the samples ,

identity matrix

zero matrix

identity matrix

auto-power spectrum of

cross-power spectrum of and

sample mean of

mean value of

variance of

sample variance of

covariance of and

sample covariance of and

diag A1 A2 … AK, , ,( ) Ak k 1 2 … K, , ,=

herm A( ) A AH+( ) 2⁄= n m× A

rank A( ) n m× A

A

vec A( )

A

�{ }Cov X Y,( ) X Y

var x( ) x

CX Cov X( ) Cov X X,( )= = X

CX X

CXY Cov X Y,( )= X Y

CXY X Y

DFT x t( )( ) x t( )

t 0 1 … N 1–, , ,=

Im m m×

0m n×

m n×Im m m×

SXX jω( ) x t( )

SXY jω( ) x t( ) y t( )

X X

µx � x{ }= x

σx2 var x( )= x

σx2 x

σxy2 covar x y,( )= x y

σxy2 x y

vii

Symbols

total covariance of the Best Linear Approximation (MIMO)

covariance of the BLA due to the measurement noise (MIMO)

covariance of the BLA due to the stochastic nonlinear

contributions (MIMO)

frequency

number of frequency domain data samples

sampling frequency

frequency response function

best linear approximation of a nonlinear plant

frequency index

number of (repeated) experiments

number of time domain data samples

, , and state dimension, input dimension, and output dimension

dimension of the parameter vector

Laplace transform variable

Laplace transform variable evaluated along the imaginary axis at

DFT frequency :

continuous- or discrete-time variable

sampling period

discrete Fourier transform of the samples and ,

Fourier coefficients of the periodic signals ,

Fourier transform of and

input and output time signals

cost function based on measurements

Z-transform variable

Z-transform variable evaluated along the unit circle at DFT

frequency :

CBLA k( )

Cn k( )

CNL k( )

f

F

fs

G jω( )

GBLA jω( )

j j2 1–=

k

M

N

na nu ny

nθ θ

s

sk

k sk jωk=

t

Ts

U k( ) Y k( ), u tTs( ) y tTs( )

t 0 1 … N 1–, , ,=

Uk Yk, u t( ) y t( )

U jω( ) Y jω( ), u t( ) y t( )

u t( ) y t( ),VF θ z,( ) F

z

zk

k zk ejωkTs ej2πk N⁄= =

viii

column vector of the model residuals (dimension )

gradient of residuals w.r.t. the parameters

(dimension )

column vector of the model parameters

nonlinear vector map of the state equation

nonlinear vector map of the output equation

column vector that contains the stacked state and input vectors

total variance of the Best Linear Approximation (SISO)

variance of the BLA due to the measurement noise (SISO)

variance of the BLA due to the stochastic nonlinear contributions

(SISO)

angular frequency

ε θ Z,( ) F

J θ Z,( ) ε θ Z,( )∂ θ∂⁄= ε θ Z,( ) θF nθ×

θζ t( )η t( )ξ t( )

σBLA2 k( )

σn2 k( )

σNL2 k( )

ω 2πf=

ix

Abbreviations

BL Band-Limited (measurement set-up)BLA Best Linear ApproximationDFT Discrete Fourier TransformDUT Device Under TestFIR Finite Impulse ResponseFFT Fast Fourier TransformFRF Frequency Response FunctionGN Gaussian Noiseiid independent identically distributedIIR Infinite Impulse ResponseLS Least SquaresLTI Linear Time InvariantMIMO Multiple Input Multiple OutputMISO Multiple Input Single OutputNARX Nonlinear Auto Regressive with eXternal input (model)NLS Nonlinear Least SquaresNOE Nonlinear Output Error (model)pdf probability density functionPID Proportional-Integral-Derivative (controller)PISPOT Periodic Input, Same Periodic OuTputPNLSS Polynomial NonLinear State SpacePSD Power Spectral DensityRF Radio FrequencyRBF Radial Basis FunctionRMS Root Mean Square (value)RMSE Root Mean Square Errorrpm rotations per minuteRPM Random Phase MultisineSA State Affine (model)SISO Single Input Single OutputSDR Signal-to-Distortion RatioSNR Signal-to-Noise RatioSVD Singular Value Decompositionw.p.1 with probability oneWLS Weighted Least Squares

1

CHAPTER 1

INTRODUCTION

Chapter 1: Introduction

2

1.1 What are Nonlinear Systems?

It is difficult, if not impossible, to give a closing definition of nonlinear systems. The famous

paradigm by the mathematician Stan Ulam illustrates this [6]: “using a term like ‘nonlinear

science’, is like referring to the bulk of zoology as the study of non-elephant animals”.

Nevertheless, the world around us is filled with nonlinear phenomena, and we are very

familiar with some of these effects. Essentially, a system is nonlinear when the rule of three is

not applicable to its behaviour. Tax rating systems in Belgium, for instance, behave

nonlinearly: the higher someone’s gross salary gets, the higher his/her average tax rate

becomes. Audio amplifiers are another good example of nonlinear systems. When their

volume is turned up too eagerly, the signals they produce get clipped, and the music we hear

becomes distorted instead of sounding louder. The weather system also behaves nonlinearly:

slight perturbations of this system can lead to massive modifications after a long period of

time. This is the so-called butterfly effect. It explains why it is so hard to accurately predict

the weather with a time horizon of more than a couple of days. In some situations, nonlinear

behaviour is a desired effect. Video and audio broadcasting, mobile telephony, and CMOS

technology would simply be impossible without nonlinear devices such as transistors and

mixers. Hence, it is important to understand and model their behaviour.

Finally, let us define, in a slightly more rigorous way, what nonlinear systems are. To this end,

we start by defining a linear system. With zero initial conditions, the system is linear if it

obeys the superposition principle and the scaling property

, (1-1)

where and are two arbitrary input signals as a function of time, and and are

two arbitrary scalar numbers. When the superposition principle or the scaling property is not

fulfilled, we call a nonlinear system. The most important implication of this open

definition, is that there exists no general nonlinear framework. That is why studying nonlinear

systems is such a difficult task.

T .{ }

T αu1 t( ) βu2 t( )+{ } αT u1 t( ){ } βT u2 t( ){ }+=

u1 t( ) u2 t( ) α β

T .{ }

Why build Nonlinear Models?

3

1.2 Why build Nonlinear Models?

From the handful examples listed in the previous section, it is clear that many real-life

phenomena are nonlinear. Often, it is possible to use linear models to approximate their

behaviour. This is an attractive idea because the linear framework is well established.

Furthermore, linear models are easy to interpret and to understand. Building linear models

usually requires significantly less effort than the estimation of nonlinear models.

Unfortunately, linear approximations are only valid for a given input range. Hence, there has

been a tendency towards nonlinear modelling in various application fields during the last

decades. Technological innovations have resulted in less limitations on the computational, the

memory, and the data-acquisition level, making nonlinear modelling a more feasible option. In

order to build models for the studied nonlinear devices, we will employ system identification

methods. Classic text books about system identification are [75], [38], and [56]. The basic

goal of system identification is to identify mathematical models from the available input/

output data. This is often achieved via the minimization of a cost function embedded in a

statistical framework (Figure 1-1). An excellent starting point for nonlinear modelling is [70].

Other reference works on nonlinear systems and nonlinear modelling include

[60],[59],[7],[5],[33],[77], and [80].

In this thesis, we focus on the estimation of so-called simulation models: from measured

input/output data, we estimate a model that, given a new input data set, simulates the output

as good as possible. Such models can for instance be used to replace expensive experiments

by cheap computer simulations. The major difference with prediction error modelling is that

no past measured outputs are used to predict new output values.

As mentioned before, there is no general nonlinear framework. However, there exists a class

of nonlinear systems that has intensively been studied in the past, and which covers a broad

spectrum of ‘nice’ nonlinear behaviour: the class of Wiener systems. This class will be used as

a starting point in this thesis; it stems from the Volterra-Wiener theory which will be briefly

explained in what follows.

data modelcost function

Figure 1-1. Basic idea of system identification: the cost function relates data and model.


4

1.3 A Framework for Nonlinear Modelling

1.3.1 Approximation Criteria

In this thesis, we will use models to approximate the behaviour of nonlinear systems. This

requires an approximation quality measure. For this, two principal convergence criteria can be

employed: convergence in mean square sense and uniform convergence.

Definition 1.1 (Convergence in the Mean) The model converges in mean

square sense to a system , if for all , independent of such that for all

,

with ,

where the expected value is taken over the class of excitation signals .

Definition 1.2 (Uniform Convergence) The model converges uniformly to a

system , if for all , independent of and such that for all ,

with .

Note that uniform convergence is a stronger result than convergence in mean square sense,

but the latter is often easier to obtain.

1.3.2 The Volterra-Wiener Theory

In order to set up a rigorous framework for the following chapter, we consider a particular

class of nonlinear systems. A classical approach is to make use of the Volterra-Wiener theory,

which is thoroughly described in [59] and [60]. A short overview of the results that will serve

in the rest of this thesis is given here.

f u θ,( )

f u( ) ε 0> Mε∃ θ

M Mε>

θM∃ u �∈∀⇒ : f u( ) f u θM,( )–2

⎩ ⎭⎨ ⎬⎧ ⎫

� ε< M dim θ( )=

�

f u θ,( )

f u( ) ε 0> Mε∃ θ u M Mε>

θM∃ u �∈∀⇒ : f u( ) f u θM,( )– ε< M dim θ( )=

A Framework for Nonlinear Modelling

5

Volterra series can be seen as the dynamic extension of power series, and they are defined as

the (infinite) sum of Volterra operators . For an input as a function of time, the

output of the series is given by

. (1-2)

The -th order continuous-time Volterra operator is defined as

, (1-3)

where is the -th order Volterra kernel. For a first order system ( ),

equation (1-3) reduces to the well-known relation

, (1-4)

which is the convolution representation of a linear system with an impulse response .

When the kernel is causal, it is zero for any negative argument:

(1-5)

Because we restrict ourselves to causal systems, the lower integral limits in (1-3) and (1-4)

are set equal to zero. Volterra series can be used to approximate the behaviour of a certain

class of nonlinear systems. However, they can suffer from severe convergence problems,

which is a common phenomenon for power series. This can occur for example in the presence

of discontinuities like hard clipping or dead-zones. To overcome this difficulty, Wiener

introduced the Wiener-G functionals [60], which are Volterra functionals orthogonalized with

respect to white Gaussian input signals. Note that the Wiener-G functionals are only required

to solve numerical issues. Hence, what follows holds for both Volterra and Wiener-G

functionals. The Wiener theory states that any nonlinear system satisfying a number of

conditions can be represented arbitrarily well in mean square sense by Volterra/Wiener-G

functionals. The restrictions on system are [60]:

Hn u t( )

y t( )

y t( ) Hn u t( )[ ]

n 1=

∞

∑=

n Hn

Hn u t( )[ ] … hn τ1 … τn, ,( )u t τ1–( )…u t τn–( ) τ1… τndd

0

∞

∫0

∞

∫=

hn τ1 … τn, ,( ) n n 1=

H1 u t( )[ ] h τ( )u t τ–( ) τd

0

∞

∫=

h τ( )

hn

hn τ1 … τn, ,( ) 0= for τi 0, < i 1 … n, ,=

f

f


6

1. is not explosive, in other words, the system’s response to a bounded input

sequence is finite;

2. has a finite memory, i.e., the present output becomes asymptotically independent

of the past values of the input;

3. is causal and time-invariant.

The set of systems satisfying these conditions is known as the class of Wiener systems.

When approximating by a Volterra/Wiener-G functional , the mean square convergence is

guaranteed over a finite time interval, with respect to the class of white Gaussian input signals

:

(1-6)

Boyd and Chua achieved even more powerful results with Volterra series via the introduction

of a concept called Fading Memory [5].

Definition 1.3 (Fading Memory) has Fading Memory on a subset of a compact

set, if there is a decreasing function : , with , such that

for each and there is a such that for all :

(1-7)

Loosely explained, an operator has Fading Memory when two input signals close to each other

in the recent past, but not necessarily in the remote past, yield present output signals that are

close. This strengthened continuity requirement on allows to obtain more powerful

approximation results with Volterra series. Boyd and Chua have proved the uniform

convergence of finite ( ) Volterra series to any continuous-time Fading Memory system

for the class of input signals with bounded amplitude and bounded slew rate, without any

restrictions on the time interval ( ).

f

f

f

W

f f

�

f u t( )( ) f u t( )( )–[ ]2

⎩ ⎭⎨ ⎬⎧ ⎫

� ε< t 0 T,[ ]∈ f W∈ u∀ �∈, ,

f K

w �� 0 1 ],(→ w t( )t ∞→lim 0=

u1 K∈ ε 0> δ 0> u2 K∈

w t–( ) u1 t( ) u2 t( )–t 0≤sup δ< f u1 t( )( ) f u2 t( )( )– ε<⇒

f

n ∞<

T ∞→


7

1.3.3 Continuous-time versus Discrete-time

Until now, we have only considered continuous-time Volterra series. However, this thesis

mainly deals with the identification of discrete-time models. The discrete-time version of the

-th order causal Volterra operator is defined as

, (1-8)

where denotes the -th order, discrete-time Volterra kernel.

In [5], the approximation properties of the Volterra series are shown under the Fading

Memory assumption for discrete-time systems. The only difference with the continuous-time

case is that the slew rate of the input signal does not need to be bounded.

1.3.4 Single Input, Single Output versus Multiple Input, Multiple Output

So far, only SISO (Single Input, Single Output) systems were considered, but in Chapter 5 we

will deal with MIMO (Multiple Input, Multiple Output) systems as well. MIMO Volterra models

with outputs are defined as separate MISO Volterra series. In the following, the

analysis is only pursued for discrete-time systems. For notational simplicity, we define as

the vector of the assembled inputs at time instance :

(1-9)

A MISO Volterra series is defined as the sum of Volterra functionals :

(1-10)

Next, consider the -th term of (1-10). The -th order, discrete-time MISO Volterra functional

is defined as

n

Hn u t( )[ ] … hn τ1 … τn, ,( )u t τ1–( )…u t τn–( )

τn 0=

∞

∑τ1 0=

∞

∑=

hn τ1 … τn, ,( ) n

ny ny

u t( )

nu t

u t( )u1 t( )

…unu

t( )=

Hn

y t( ) Hn u t( )[ ]

n 0=

∞

∑=

n n


8

, (1-11)

where are input indices between and , and

. (1-12)

To determine the number of distinct operators of order , we need to apply

some combinatorials. In Appendix 5.A, the same problem is solved in a different context, but

the idea remains the same. Hence, we distinguish

(1-13)

different terms.

1.3.5 What is not included in the Volterra Framework?

Since Volterra series are open loop models, they cannot represent a number of closed loop

phenomena. Bearing the negative definition of nonlinear systems in mind, it is impossible to

give an exhaustive list of the systems that cannot be approximated by Volterra models.

However, it is possible to sum up a couple of examples.

• No Subharmonic Generation

In [60], it was shown that the steady-state response of a Volterra series

to a harmonic input is harmonic, and has the same period as the input.

Hence, systems that generate subharmonics are excluded from the

Volterra-Wiener framework. For this reason, we sometimes say that

Wiener systems are PISPOT (Periodic Input, Same Periodic OuTput)

systems. An example of subharmonic generation is given in the “Duffing

Oscillator” on p. 111.

Hn u t( )[ ] Hn

j1 … jn, ,u t( )[ ]

j1 … jn, ,∑=

ji 1 nu

Hn

j1 … jn, ,u t( )[ ] … h

j1 … jn, ,τ1 … τn, ,( )uj1

t τ1–( )…ujnt τn–( )

τn 0=

∞

∑τ1 0=

∞

∑=

Hn

j1 … jn, ,u t( )[ ] n

nu n 1–+

n⎝ ⎠⎜ ⎟⎛ ⎞ n nu 1–+( )!

n! nu 1–( )!------------------------------=


9

• No Chaotic Behaviour

Chaos is typically the result of a nonlinear dynamic system of which the

output depends extremely on the initial conditions. Such behaviour

conflicts with the finite memory requirement of the Wiener class: the

present output does not become asymptotically independent of the past.

An example of chaotic behaviour is given in “Lorenz Attractor” on p. 113.

• No Multiple-valued Output

Volterra series are a single-valued output representation. Hence, they

cannot represent systems that exhibit output multiplicity, like for instance

hysteresis.


10

1.4 Outline of the thesis

All the work in this thesis relies on the concept of the Best Linear Approximation (BLA).

Therefore, in Chapter 2 the BLA is first introduced in an intuitive way, and then rigorously

defined for SISO and MIMO nonlinear systems. Furthermore, some interesting properties of

multisine excitation signals with respect to the qualification and quantification of nonlinear

behaviour are rehearsed. Finally, we explain how the BLA should be estimated, for both non

periodic and periodic input/output data. As will become clear in Chapter 2, periodic excitations

are preferred, since in that case more information can be extracted from the Device Under

Test.

The tools described in Chapter 2 are applied to a number of Digital Signal Processing (DSP)

algorithms in Chapter 3. A measurement technique is proposed to characterize the non-

idealities of DSP algorithms which are induced by quantization effects, overflows, or other

nonlinear effects. The main idea is to apply specially designed excitations such that a

distinction can be made between the output of the ideal system and the contributions of the

system’s non-idealities. The proposed method is applied to digital filtering and to an audio

compression codec.

In Chapter 4, an identification procedure is presented for a specific kind of block-oriented

model: the Nonlinear Feedback model. By estimating the Best Linear approximation of the

system and by rearranging the model’s structure, the identification of the feedback model

parameters is reduced to a linear problem. The numerical parameter values obtained by

solving the linear problem are then used as starting values for a nonlinear optimization

procedure. The proposed method is illustrated on measurements obtained from a physical

system.

Chapter 5 introduces the Polynomial Nonlinear State Space model (PNLSS) and studies its

approximation capabilities. Next, a link is established between this model and a number of

classical block-oriented models, such as Hammerstein and Wiener models. Furthermore, by

means of two simple examples, we illustrate that the proposed model class is broader than

the Volterra framework. In the last part of Chapter 5, a general identification procedure is

presented which utilizes the Best Linear Approximation of the nonlinear system. Next,

Outline of the thesis

11

frequency domain subspace identification is employed to initialize the PNLSS model. The

identification of the full PNLSS model is then regarded as a nonlinear optimization problem.

In Chapter 6, the proposed identification procedure is applied to measurements from various

real-life systems. The SISO test cases comprise three electronic circuits (the Silverbox, a

Wiener-Hammerstein system and a RF crystal detector), and two mechanical set-ups (a

quarter car set-up and a robot arm). Furthermore, two mechanical MISO applications are

discussed (a combine harvester and a semi-active magneto-rheological damper).

Finally, Chapter 7 deals with the conclusions and some ideas on further research.


12

1.5 ContributionsThe main goal of this thesis is to study and design tools which allow the practicing engineer to

qualify, to understand and to model nonlinear systems. In this context, the contributions of

this thesis are:

• The characterization of DSP systems/algorithms via the Best Linear Approximation and

multisine excitation signals.

• A method to generate starting values for a block-oriented, Nonlinear Feedback model

with a static nonlinearity in the feedback loop.

• A method that initializes the Polynomial NonLinear State Space (PNLSS) model by

means of the BLA of the Device Under Test.

• The establishment of a link between the PNLSS model structure and five classical block-

oriented models.

• The application of the proposed identification method to several real-life measurement

problems.

Publication List

13

1.6 Publication List

Chapter 3 was published as

• J. Paduart, J. Schoukens, Y. Rolain. Fast Measurement of Quantization Distortions in DSP

Algorithms. IEEE Transactions on Instrumentation and Measurement, vol. 56, no. 5,

pp. 1917-1923, 2007.

The major part of Chapter 4 was presented at the Nolcos 2004 conference:

• J. Paduart, J. Schoukens. Fast Identification of systems with nonlinear feedback.

Proceedings of the 6th IFAC Symposium on Nonlinear Control Systems, Stuttgart, Germany,

pp. 525-529, 2004.

The comparative study between the PNLSS model and the block-oriented models from

Chapter 5 were presented at IMTC 2007:

• J. Paduart, J. Schoukens, L. Gommé. On the Equivalence between some Block-oriented

Nonlinear Models and the Nonlinear Polynomial State Space Model. Proceedings of the

IEEE Instrumentation and Measurement Technology Conference, Warsaw, Poland, pp. 1-6,

2007.

The identification of the PNLSS model and its application to two real-life set-ups was

presented at the SYSID 2006 conference:

• J. Paduart, J. Schoukens, R. Pintelon, T. Coen. Nonlinear State Space Modelling of

Multivariable Systems. Proceedings of the 14th IFAC Symposium on System Identification,

Newcastle, Australia, pp. 565-569, 2006.

The application of the PNLSS model to a quarter car set-up led to the following publication:


14

• J. Paduart, J. Schoukens, K. Smolders, J. Swevers. Comparison of two different

nonlinear state-space identification algorithms. Proceedings of the International Conference

on Noise and Vibration Engineering, Leuven, Belgium, pp. 2777-2784, 2006.

Finally, the cooperation with colleagues from the ELEC Department (Vrije Universiteit Brussel),

the PMA and BIOSYST-MeBioS Department (KULeuven) resulted in the following publications:

• J. Schoukens, J. Swevers, J. Paduart, D. Vaes, K. Smolders, R. Pintelon. Initial estimates

for block structured nonlinear systems with feedback. Proceedings of the International

Symposium on Nonlinear Theory and its Applications, Brugge, Belgium, pp. 622-625, 2005.

• J. Schoukens, R. Pintelon, J. Paduart, G. Vandersteen. Nonparametric Initial Estimates

for Wiener-Hammerstein systems. Proceedings of the 14th IFAC Symposium on System

Identification, Newcastle, Australia, pp. 778-783, 2006.

• T. Coen, J. Paduart, J. Anthonis, J. Schoukens, J. De Baerdemaeker. Nonlinear system

identification on a combine harvester. Proceedings of the American Control Conference,

Minneapolis, Minnesota, USA, pp. 3074-3079, 2006.

15

CHAPTER 2

THE BEST LINEAR

APPROXIMATION

In this chapter, we introduce the Best Linear Approximation in an

intuitive way. Next, the excitation signals used throughout this thesis

are presented, followed by a formal definition of the Best Linear

Approximation. We then explain how the properties of a multisine

excitation signal can be exploited to quantify and qualify the nonlinear

behaviour of a system. Finally, we show how the Best Linear

Approximation of a nonlinear system can be obtained.

Chapter 2: The Best Linear Approximation

16

2.1 IntroductionAs was shown in the introductory chapter, linear models have many attractive properties.

Therefore, it can be useful to approximate nonlinear systems by linear models. Since we are

dealing with an approximation, model errors will be present. Hence, a framework needs to be

selected in order to decide in which sense the approximate linear model is optimal. We will

use a classical approach and minimize the errors in mean square sense.

Definition 2.1 (Best Linear Approximation) The Best Linear Approximation (BLA) is

defined as the model belonging to the set of linear models , such that

, (2-1)

where and are the input and output of the nonlinear system, respectively.

In general, the Best Linear Approximation of a nonlinear system depends on the

amplitude distribution, the power spectrum, and the higher order moments of the stochastic

input [56],[19],[21]. The amplitude dependency is illustrated by means of a short

simulation example.

Example 2.2 Consider the following static nonlinear system:

, (2-2)

and three white excitation signals drawn from different distributions: a uniform, a

Gaussian, and a binary distribution (Figure 2-1 (a)). The parameters of these

distributions are chosen such that their variance is equal to one. Figure 2-1 (b) shows

the BLA (grey) for the three distributions, together with the static nonlinearity (black).

In this set-up, the BLA is a straight line through the origin. Note that in general, a static

nonlinearity does not necessarily have a static BLA [19]. It can be seen from Figure 2-

1 (b) that the slope g of the BLA changes from distribution to distribution.

From this example, it is clear that the properties of the input are of paramount importance

when the BLA of a nonlinear system is determined. That is why we will start by discussing the

class of excitation signals used throughout this thesis. Next, a formal definition of the Best

G �

GBLA y t( ) G u t( )( )– 2{ }�G �∈

arg min=

u t( ) y t( )

GBLA

u t( )

y tanh u( )=

Introduction

17

Linear approximation is given. Then, we will demonstrate how multisine signals can be used

to quantify and qualify the nonlinear behaviour of the Device Under Test (DUT). Finally, we

will show how the BLA can be determined for SISO and MIMO systems.

2 0 22

0

2

(b)

g = 0.67

2 0 22

0

2g = 0.61

2 0 22

0

2g = 0.76

2 0 20

0.25

0.5

Uniform Distribution

(a)

2 0 20

0.25

0.5

Gaussian Distribution

2 0 20

0.25

0.5

Binary Distribution

Figure 2-1. (a) Three different probability density functions;(b) Static nonlinearity (black) and BLA (grey).


18

2.2 Class of Excitation Signals

Since the Best Linear Approximation of a nonlinear system depends on the properties of the

applied input signal, it is important to define the kind of excitations that will be employed, and

to discuss their properties. In this thesis, we will utilize the class of Gaussian excitation signals

with a user-defined power spectrum. Furthermore, it is required that the signals are

stationary such that their power spectrum is well defined. Three excitation signals that are

commonly used belong to : Gaussian noise, Gaussian periodic noise, and random phase

multisines (Figure 2-2). For periodic noise and random phase multisines, this membership is

only asymptotic, i.e., for the number of excited frequency components going to infinity

( ). We will restrict ourselves to Gaussian noise and random phase multisines. We will

give a definition and a brief overview of some of the properties of these signals.

2.2.1 Random Phase Multisine

Definition 2.3 (Random Phase Multisine) A random phase multisine is a periodic

signal, defined as a sum of harmonically related sine waves:

(2-3)

with , , and the maximum frequency of the

excitation signal. The amplitudes are chosen in a custom fashion, according to the

user-defined power spectrum that should be realized. The phases are the realizations

of an independent distributed random process such that .

�

�

N ∞→

Gaussian Signals

Gaussian Periodic Noise

Random Phase Multisines *

Figure 2-2. Class of excitation signals.*: asymptotic result, for the number of excited frequency components going to infinity.

�Gaussian Noise

u t( ) 1

N-------- Uke

j 2πfmaxkN----t φk+⎝ ⎠

⎛ ⎞

k N–=

N

∑=

φ k– φk–= Uk U k– Ukfmax

N-------------⎝ ⎠⎛ ⎞= = fmax

U f( )

φk

ejφk{ }� 0=

Class of Excitation Signals

19

The factor serves as normalization such that, asymptotically ( ), the power of

the multisine remains finite, and its Root Mean Square (RMS) value stays constant as

increases. A typical choice is to take uniformly distributed over , but for instance

discrete phase distributions can be used as well, as long as holds. Note that the

random phase multisine is asymptotically normally distributed ( ), but in practice 20

excited lines works already very well for smoothly varying amplitude distributions [56],[62].

Next, we illustrate some of the properties of a random phase multisine signal with a short

example.

Example 2.4 We consider a random phase multisine with , a flat power

spectrum and = 0.25 Hz. Figure 2-3 (a) shows the histogram of this signal

together with a theoretic Gaussian pdf having the same variance and expected value.

Figure 2-3 (b) and (c) show the multisine in the time and frequency domain,

respectively. From these plots, we see that the random phase multisine has a noisy

behaviour in the time domain, and perfectly realizes the user-defined amplitude

spectrum in the frequency domain.

The main advantage of random phase multisines is the fact that their periodicity can be

exploited to distinguish the measurement noise from the nonlinear distortions [15]. Further in

this chapter, we will go into detail about this property. A drawback is the need to introduce a

settling time for the transients, which is common to periodic excitation signals.

1 N⁄ N ∞→

N

φk [0 2π, )

ejφk{ }� 0=

N ∞→

0 0.2 0.44

2

0

2

4

Am

plit

ude

Probability

(a)

1 128 2564

2

0

2

4

Time [s]

(b)

0 0.250

1

2

Frequency [Hz]

(c)

Figure 2-3. Some properties of a Random Odd, Random Phase Multisine:(a) Histogram (black) and theoretic Gaussian pdf (grey),

(b) Time domain and (c) Frequency domain representation (DFT spectrum).

N 128=

fmax


20

2.2.2 Gaussian Noise

Definition 2.5 (Gaussian Noise) A Gaussian noise signal is a random sequence

drawn from a Gaussian distribution with a user-defined power spectral density.

Example 2.6 An example of a Gaussian noise signal is shown in Figure 2-4 (b). To

generate this sequence, a signal of samples was drawn from a normal

distribution. In order to achieve the same bandwidth as the random phase multisine

from Figure 2-3, the signal was filtered using a 6th order Butterworth filter with a cut-off

frequency of 0.25 Hz. Finally, to obtain the same RMS value as for the multisine

example, the amplitude of the filtered sequence was normalized. Figure 2-4 (a) shows

the histogram of this signal together with a theoretic Gaussian pdf. In Figure 2-4 (c), the

DFT spectrum of the sequence is plotted.

The DFT spectrum of the Gaussian noise contains dips that can lead to unfavourable results

such as a low SNR at some frequencies. Two more disadvantages are associated with the non

periodic nature of random Gaussian noise. First of all, no distinction can be made between the

measurement noise and the nonlinear distortions using simple tools. Secondly, leakage errors

are present when computing the DFT of this signal. Note that when comparing Figure 2-3 and

Figure 2-4, it is impossible to distinguish a random phase multisine and a random Gaussian

noise sequence based on their histogram and time domain waveform.

N 128=

0 0.2 0.44

2

0

2

4

Am

plit

ude

Probability

(a)

1 128 2564

2

0

2

4

Time [s]

(b)

0 0.25 0.50

1

2

Frequency [Hz]

(c)

Figure 2-4. Some properties of filtered random Gaussian Noise:(a) Histogram (black) and Gaussian pdf (grey),

(b) Time domain and (c) Frequency domain representation (DFT spectrum).

Properties of the Best Linear Approximation

21

2.3 Properties of the Best Linear Approximation

2.3.1 Single Input, Single Output Systems

Consider a noiseless Single Input, Single Output (SISO) nonlinear system with an input

and an output (see Figure 2-5 (a)). We make the following assumption on .

Assumption 2.7 There exists a uniformly bounded Volterra series of which the output

converges in mean square sense to the output of for .

These systems are also called Wiener, or PISPOT systems. This class includes discontinuities

like quantizers or relays, and excludes chaotic behaviour or systems with bifurcations.

Theorem 2.8 If satisfies Assumption 2.7, it can be modelled as the sum of a linear

system , called the Best Linear Approximation, and a noise source . The

Best Linear Approximation is calculated as

, (2-4)

where is the auto-power spectrum of the input, and the cross-power

spectrum between the output and the input.

Relation (2-4) is obtained by calculating the Fourier transform of the Wiener-Hopf equation,

which on its turn follows from equation (2-1) (see for instance [24] for this classic result).

Note that by using equation (2-4) no causality is imposed on . In [53], it was

shown that the BLA for Gaussian noise and random phase multisines with an equivalent

power spectrum are asymptotically identical. The BLA for random phase multisines converges

to as the number of frequency components goes to infinity:

(2-5)

The Best Linear Approximation for a nonlinear SISO system is illustrated in Figure 2-5 (b). The

noise source represents that part of the output that cannot be captured by the linear

model . Hence, for frequency we have that

S u

y S

y S u �∈

S

GBLA jω( ) ys

GBLA jω( )Syu jω( )Suu jω( )-------------------=

Suu jω( ) Syu jω( )

GBLA jω( )

GBLA jω( ) N

GBLA N, jω( ) GBLA jω( ) O N 1–( )+=

ys y

GBLA jω( ) ωk


22

. (2-6)

depends on the particular input realization and exhibits a stochastic behaviour from

realization to realization, with . Hence, can be determined by

averaging the system’s response over several input realizations.

2.3.2 Multiple Input, Multiple Output Systems

In [17], the Best Linear Approximation was extended to a Multiple Input, Multiple Output

(MIMO) framework. Consider a noiseless nonlinear system with inputs ( )

and outputs ( ). As in the SISO case, needs to fulfil some conditions in

order to define the Best Linear Approximation.

Assumption 2.9 For , there exists a uniformly bounded MIMO Volterra series of

which the outputs converge in mean square sense to the outputs of for .

Y jωk( ) GBLA jωk( )U jωk( ) Ys jωk( )+=

Ys jωk( )

Ys jωk( ){ }� 0= GBLA jω( )

u

u y

y

nonlinear

system

linearsystem

Figure 2-5. (a) SISO nonlinear system vs. (b) its alternative representation.

(a)

(b)GBLA

yBLA

ys

S ui i 1 … nu, ,=

yj j 1 … ny, ,= S

i j,∀

yj S ui �∈

Properties of the Best Linear Approximation

23

Theorem 2.10 If satisfies Assumption 2.9, it can be modelled as the sum of a linear

system , and noise sources . The Best Linear Approximation of is

calculated as

, (2-7)

where is the auto-power spectrum of the inputs, and

is the cross-power spectrum between the outputs and the inputs.

Also for MIMO systems, it was proven that the Best Linear Approximation for Gaussian noise

and random phase multisines is asymptotically equivalent [17].

Figure 2-6 (a) shows a nonlinear MIMO system with inputs and outputs. In Figure 2-6

(b), the alternative representation of this system is given: the Best Linear Approximation

together with the stochastic nonlinear noise sources .

For frequency , the following equation holds:

, (2-8)

S

GBLA jω( ) ny ysj( ) S

GBLA jω( ) Syu jω( )Suu1–

jω( )=

Suu jω( ) �nu nu×

∈

Syu jω( ) �ny nu×

∈

nonlinear

system

Figure 2-6. (a) MIMO nonlinear system vs. (b) its alternative representation.

(a)

(b)

… …

… …

linear

systemGBLA

ys1( ) ys

2( )ysny( )

yBLA1( )

yBLA2( )

yBLA

ny( )

u1

unu

u2

u1

unu

u2

y1

yny

y2

y1

yny

y2

nu ny

GBLA jω( ) ny Ysj( )

jωk( )

ωk

Y jωk( ) GBLA jωk( )U jωk( ) Ys jωk( )+=


24

with , , , and .

Here also, we have that .

Y jωk( ) �ny 1×

∈ GBLA jωk( ) �ny nu×

∈ U jωk( ) �nu 1×

∈ Ys jωk( ) �ny 1×

∈

Ys jωk( ){ }� 0=

Some Properties of Nonlinear Systems

25

2.4 Some Properties of Nonlinear Systems

From Definition 2.3, we know that the amplitude spectrum of a random phase multisine can

be customized. We will demonstrate that this freedom can be used to detect, quantify and

qualify the nonlinear behaviour of devices that satisfy Assumption 2.7. The tools developed

here will be used extensively in Chapter 3 to characterize Digital Signal Processing (DSP)

algorithms.

2.4.1 Response to a Sine Wave

Figure 2-7 shows the response of a linear (a) and a nonlinear (b) system to a sine wave. The

output of the linear system consists of a sine wave, possibly with a modified amplitude and

phase. The output spectrum of the nonlinear system, in general, contains additional spectral

components, harmonically related to the input sine wave. Hence, the spectral components on

the non excited spectral lines indicate the level of nonlinear behaviour of the DUT.

2.4.2 Even and Odd Nonlinear Behaviour

Using this principle, we can also retrieve qualitative information about the nonlinear behaviour

of the DUT. In Figure 2-8, two kinds of nonlinear systems are considered. In (a), an even

nonlinear system is excited with a sine wave with a frequency . The output spectrum of

this system only contains contributions on the even harmonic lines (green arrows). This is due

linear

system

nonlinear

system

f

|A|

f

|A|

f

|A|

f

|A|

Figure 2-7. Response of (a) a linear system and (b) a nonlinear system to a sine wave.

(a)

(b)

f0


26

to the fact that the output spectrum of an even nonlinear system only contains even

combinations of the input frequencies (e.g. , ,...). For an odd nonlinear system

(b), the converse is true: its output spectrum consists of components on the odd frequency

lines (red arrows). Here, the output spectrum contains only odd combinations of the input

frequencies (e.g. , ,...).

2.4.3 The Multisine as a Detection Tool for Nonlinearities

Consider now the case of a multisine signal, applied to a nonlinear system (see Figure 2-9).

We will show that by carefully choosing the spectrum of the multisine, we can qualify and

quantify the nonlinear behaviour of the DUT. For a multisine having only odd frequency

components ( , , ,...), the even nonlinearities will only generate spectral components

at even frequencies, since an even combination of odd frequencies always yields in an even

frequency. Hence, the even frequency lines at the output can be used to detect even

nonlinear behaviour of the DUT. Furthermore, odd combinations of odd frequency lines always

f0 f0+ f0 f0–

f0 f0 f0+ + f0 f0 f0–+

f

|A|

f

|A|

f

|A|

f

|A|

Figure 2-8. Response of (a) an even nonlinear system and(b) an odd nonlinear system to a sine wave.

y e u2–=

y tanh u( )=

even nonlinearitye.g.

odd nonlinearitye.g.

(a)

(b)

f

|A|nonlinear

system

f

|A|

Figure 2-9. Response of a nonlinear system to a multisine.

f0 3f0 5f0

Some Properties of Nonlinear Systems

27

result in odd frequency lines. Hence, when some of the odd frequency lines are not excited,

they can serve to detect odd nonlinear behaviour of the DUT [56].

According to which frequency lines are used to detect the nonlinear behaviour, several kinds

of random phase multisines can be distinguished:

• Full multisine: all frequencies up to are excited,

for . (2-9)

• Odd multisine: only the odd lines are excited,

(2-10)

Since even nonlinearities only contribute to the even harmonic output

lines, they do not disturb the FRF measurements in this case. Hence, a

lower uncertainty is achieved for the BLA [64].

• Random odd multisine: this is an odd multisine where the odd frequency

lines are divided into groups with a predefined length, for instance, a

block length of 4. In each block, all odd lines are excited ( )

except for one line which serves as a detection line for the odd

nonlinearities. The frequency index of this line is randomly selected in

each consecutive block [57]. This is the best way to reflect the nonlinear

behaviour of the DUT. Hence this type of multisine should be the default

option when employing random phase multisines.

The ability to analyse the nonlinear behaviour of the DUT comes with a price: the frequency

resolution diminishes or, equivalently, the measurement time increases.

fmax

Uk A= k 1 … N, ,=

UkA k 2n 1+= k N≤ n, �∈0 elsewhere⎩

⎨⎧

=

Uk A=


28

2.5 Estimating the Best Linear ApproximationIn the last part of this chapter, we explain how the Best Linear Approximation of a nonlinear

system can be determined. First, single input, single outputs systems are treated. Next, the

results are extended to MIMO systems. Depending on the kind of excitation signals used

during the experiments, the BLA is calculated differently. Therefore, we will make a distinction

between periodic and non periodic data.

2.5.1 Single Input, Single Output Systems

A. Periodic Data

When using periodic excitation signals to determine the BLA of a SISO system, the

experiments should be carried out according to the scheme depicted in Figure 2-10 [15]. In

total, different random phase multisines are applied, as shown on the vertical axis. After

the transients have settled, we measure for each experiment periods of the input and the

output. Per experiment and period , we compute the DFT of the input ( )

and the output ( ). Then, the spectra are averaged over the periods. For

frequency , we obtain:

(2-11)

For every experiment , we then calculate the FRF estimate :

, (2-12)

which is equivalent to (2-4) for periodic excitations. Next, the FRF estimates

are combined in order to obtain the Best Linear Approximation :

(2-13)

M

P

m p U k( ) m p[ , ]�∈

Y k( ) m p[ , ]�∈

k

U k( )m[ ] 1

P--- U k( ) m p[ , ]

p 1=

P

∑=

Y k( )m[ ] 1

P--- Y k( ) m p[ , ]

p 1=

P

∑=

m G jωk( )m[ ]

�∈

G jωk( )m[ ] Y k( )

m[ ]

U k( ) m[ ]

--------------------=

M G jωk( )m[ ]

GBLA jωk( )

GBLA jωk( ) 1M----- G jωk( )

m[ ]

m 1=

M

∑=

Estimating the Best Linear Approximation

29

Furthermore, due to the periodic nature of the excitation signals, the effect of the nonlinear

distortions and the measurement noise on the Best Linear Approximation can be distinguished

from each other. The variations over the periods stem from the measurement noise, while

the variations over the experiments are due to the combined effect of the measurement

noise and the stochastic nonlinear behaviour. Note that non-stationary disturbances such as

non-synchronous periodic signals can also be detected using the measurement scheme from

Figure 2-10 [15]. First, we will determine the sample variance of due to the

measurement noise. A straightforward way to achieve this is to calculate the FRFs per period:

, (2-14)

and to employ to calculate the sample variance of , which is

then given by

. (2-15)

periods

expe

rimen

ts

Transient

U1 1[ , ]

Y1 1[ , ],

G1[ ]

σn2 1[ ]

,U

1 P[ , ]Y

1 P[ , ],

U2 1[ , ]

Y2 1[ , ], U

2 P[ , ]Y

2 P[ , ],

UM 1[ , ]

YM 1[ , ], U

M P[ , ]Y

M P[ , ],G

M[ ]σn

2 M[ ],

G2[ ]

σn2 2[ ]

,…

…

…… ……

…

Figure 2-10. Experiment design to calculate the BLA of a SISO nonlinear system.

P

m 1=

m 2=

m M=

M

P

M

GBLA jωk( )

G jωk( )m p,[ ] Y k( ) m p[ , ]

U k( ) m p[ , ]-----------------------=

G jωk( )m p,[ ]

σn2 m[ ]

G jωk( )m[ ]

σn2 m[ ] 1

P P 1–( )--------------------- G jωk( )

m p,[ ]G jωk( )

m[ ]–

2

p 1=

P

∑=


30

The drawback of this approach is that in equation (2-14) raw input data are employed without

increasing the SNR by averaging over the periods. If the SNR of the input is low, the estimates

will be of poor quality: a non negligible bias is present (SNR < 10 dB, [55]) and

a high uncertainty (SNR < 20 dB, [54]). A better option to determine is to use the

covariance information from the averaged input and output spectra, and to apply a first order

approximation [56]. First, we calculate the sample variances and covariance of the estimated

spectra and :

(2-16)

In (2-16), the frequency index was omitted in order to simplify the formulas. From

, , and , the sample variance of can

be approximated by [56]:

, (2-17)

with . In a noiseless input framework, expression (2-17) simplifies to

. (2-18)

Next, the estimates are averaged in order to acquire an improved estimate:

. (2-19)

After applying the -law, we obtain the uncertainty of due to the measurement

noise:

G jωk( )m p,[ ]

σn2 m[ ]

U k( )m[ ]

Y k( )m[ ]

σU2 m[ ] 1

P 1–------------ U

m p[ , ]U

m[ ]–

2

p 1=

P

∑=

σY2 m[ ] 1

P 1–------------ Y

m p[ , ]Y

m[ ]–

2

p 1=

P

∑=

σYU2 m[ ] 1

P 1–------------ Y

m p[ , ]Y

m[ ]–( ) U

m p[ , ]U

m[ ]–( )

p 1=

P

∑=

k

σU2 m[ ]

�∈ σY2 m[ ]

�∈ σYU2 m[ ]

�∈ σn2 m[ ]

G jωk( )m[ ]

σn2 m[ ] G

m[ ] 2

P-----------------

σY2 m[ ]

Ym[ ] 2

----------------σU

2 m[ ]

Um[ ] 2

----------------- 2ReσYU

2 m[ ]

Ym[ ]

Um[ ]

-----------------------

⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞

–+⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞

=

σn2 m[ ]

�∈

σn2 m[ ] 1

P---

σY2 m[ ]

Um[ ] 2

-----------------⎝ ⎠⎜ ⎟⎛ ⎞

=

M σn2 m[ ]

1M----- σn

2 m[ ]

m 1=

M

∑

N GBLA


31

(2-20)

Furthermore, the combined effect of the stochastic nonlinear behaviour and the measurement

noise can also be measured by calculating the total sample variance . It is determined

from the estimates :

(2-21)

The total variance of the BLA is equal to the sum of the measurement noise variance and the

variance due to the stochastic nonlinear contributions :

(2-22)

Hence, is estimated with

. (2-23)

To conclude, for frequency index we now have:

σn2 1

M2

------- σn2 m[ ]

m 1=

M

∑=

σBLA2

M G jωk( )m[ ]

σBLA2 1

M M 1–( )------------------------ G

m[ ]GBLA–

2

m 1=

M

∑=

σNL2 k( )

σBLA2 k( ) σNL

2 k( ) σn2 k( )+=

σNL2 k( )

σNL2 k( ) σBLA

2 k( ) σn2 k( )–=

k

• : The Best Linear Approximation.

• : The total sample variance of the estimate , due

to the combined effect of the measurement noise and the stochastic

nonlinear behaviour.

• : The measurement noise sample variance of the estimate

.

• : The sample variance of the stochastic nonlinear contributions.

GBLA jωk( )

σBLA2 k( ) GBLA jωk( )

σn2 k( )

GBLA jωk( )

σNL2 k( )


32

B. Non Periodic Data

A combination of higher order correlation tests can be used to detect unmodelled

nonlinearities for arbitrary excitations [20]. But in opposition to the case of periodic excitation

signals, no simple methods exist to distinguish between the nonlinear distortions and the

effect of measurement noise when non periodic excitations are used. Furthermore, leakage

errors will be present when calculating the input and output DFT spectra [53]. However, it is

still possible to characterize the combined effect of nonlinear distortions and measurement

noise if we assume a noiseless input framework.

First, the measured input and output time domain data are split into blocks. In order to

reduce the leakage effect, a Hanning or a diff window can for instance be applied to the

signals; we will employ the latter [65]. Then, the input and output DFT spectra of each block

are calculated. The next step is to calculate the sample cross-power spectrum between the

output and the input and the auto-power spectrum of the input , using

Welch’s method [84]:

. (2-24)

The BLA is then given by

. (2-25)

When working in a noiseless input framework, it can be shown that the following expression

yields an unbiased estimate of the covariance of the output DFT :

, (2-26)

where the factor 2 stems from the usage of the diff window. From this expression, we can

derive the uncertainty of (see the end result of Appendix 2.B, simplified to the

SISO case):

M

SYU k( ) SUU k( )

SXY1M----- X

m[ ]Y

m[ ]H

m 1=

M

∑=

GBLA jωk( )SYU k( )

SUU k( )------------------=

Y k( )

σY2 k( ) M

2 M 1–( )---------------------- SYY k( ) GBLA jωk( )SUY k( )–( )=

GBLA jωk( )


33

(2-27)

To summarize, for frequency we have:

Remark: When choosing the number of blocks to split the input and output data, a trade-

off is made between the leakage effects and the uncertainty due to the stochastic nonlinear

behaviour and the measurement noise. When is large, this results in shorter time records

of length . Hence, the leakage will be more important, since leakage is a effect for

a rectangular window, and a effect for a Hanning window [53]. Furthermore, the

frequency resolution diminishes with increasing . On the other hand, it can be seen from

(2-27) that the uncertainty on the estimate reduces for a larger . To conclude, with

the number of blocks we can balance between a lower variance and a better frequency

resolution of the BLA.

σBLA2 k( ) 1

M-----

σY2 k( )

SUU k( )------------------=

k


• : The total sample variance of the estimate , due

to the combined effect of the measurement noise and the stochastic

nonlinear behaviour.

GBLA jωk( )


M

M

N O N1–( )

O N2–( )

M

G k( ) M

M


34

2.5.2 Multiple Input, Multiple Output Systems

Next, we explain how the Best Linear Approximation is estimated for MIMO systems. In the

case of periodic excitation signals, the essential difference between the SISO and the MIMO

framework is the need for multiple experiments. This stems from the fact that the influences

of the different inputs, superposed in a single experiment, need to be separated. Again, a

distinction is made between periodic and non periodic data.

A. Periodic Data

When periodic signals are employed to determine the Best Linear Approximation of a MIMO

system, the experiments are usually carried out according to the scheme depicted in Figure 2-

11. In total, blocks of experiments are performed, as shown on the vertical axis. After

the transients have settled, periods of the input and the output are measured for each

experiment. Per block and period , we assemble the DFT spectra of the inputs

and outputs in the matrices and , respectively.

Then, the input and output spectra are averaged over the periods per block . For frequency

, we have:

(2-28)

For every block , we now calculate an FRF estimate :

. (2-29)

The FRF estimates are then combined in order to obtain the Best Linear

Approximation :

(2-30)

M nu

P

m p nu nu

ny U k( ) m p[ , ]�

nu nu×∈ Y k( ) m p[ , ]

�ny nu×

∈

m

k

U k( )m[ ] 1

P--- U k( ) m p[ , ]

p 1=

P

∑=

Y k( )m[ ] 1

P--- Y k( ) m p[ , ]

p 1=

P

∑=

m G jωk( )m[ ]

�ny nu×

∈

G jωk( )m[ ]

Y k( )m[ ]

U k( ) m[ ]

⎝ ⎠⎛ ⎞

1–=

M G jωk( )m[ ]

GBLA jωk( )

GBLA jωk( ) 1M----- G jωk( )

m[ ]

m 1=

M

∑=


35

Again, we are able to make a distinction between the nonlinear distortions and the effect of

the measurement noise. First, we will determine the sample covariance matrix of

due to the measurement noise. For the discussion why the calculation should be carried out

via the covariances of the input and output spectra, we refer to the SISO case (“Periodic Data”

on p. 28). The sample covariance matrices of the averaged DFT spectra and

are given by:

(2-31)

In (2-31), the frequency index was omitted in order to simplify the formulas. From

, , and , the sample

covariance of is estimated with (see Appendix 2.A):

periods

expe

rimen

ts

Transient

U1 1[ , ]

Y1 1[ , ],

G1[ ]

Cn1[ ]

,U

1 P[ , ]Y

1 P[ , ],

U2 1[ , ]

Y2 1[ , ], U

2 P[ , ]Y

2 P[ , ],

UM 1[ , ]

YM 1[ , ], U

M P[ , ]Y

M P[ , ],G

M[ ]Cn

M[ ],

G2[ ]

Cn2[ ]

,…

…

…… ……

…

Figure 2-11. Experiment design to calculate the BLA of a MIMO nonlinear system.

P

Mn u

×

m 1=

m 2=

m M=

…

GBLA jωk( )

U k( )m[ ]

Y k( )m[ ]

CU

m[ ]1

P P 1–( )--------------------- vec U

m p[ , ]U

m[ ]–⎝ ⎠

⎛ ⎞ vec Um p[ , ]

U

m[ ]–⎝ ⎠

⎛ ⎞H

p 1=

P

∑=

CY

m[ ]1

P P 1–( )--------------------- vec Y

m p[ , ]Y

m[ ]–⎝ ⎠

⎛ ⎞ vec Ym p[ , ]

Y

m[ ]–⎝ ⎠

⎛ ⎞H

p 1=

P

∑=

CYU

m[ ]1

P P 1–( )--------------------- vec Y

m p[ , ]Y

m[ ]–⎝ ⎠

⎛ ⎞ vec Um p[ , ]

U

m[ ]–⎝ ⎠

⎛ ⎞H

p 1=

P

∑=

k

CU

m[ ] �nunu nunu×

∈ CY

m[ ] �nynu nynu×

∈ CYU

m[ ] �nynu nunu×

∈

Cnm[ ]

Gm[ ]


36

(2-32)

with . In a noiseless input framework, expression (2-32) simplifies to

(2-33)

Since estimates are at our disposal, we can combine them in order to obtain an

improved estimate of the covariance matrix that characterizes the measurement noise:

(2-34)

The combined effect of the stochastic nonlinear behaviour and the measurement noise is

characterized by the total sample covariance . This quantity is determined from the

estimates :

(2-35)

The total covariance of the BLA is equal to the sum of the measurement noise covariance

and the covariance due to the stochastic nonlinear contributions :

(2-36)

Hence, is estimated with

(2-37)

Cnm[ ]

U

m[ ]⎝ ⎠⎛ ⎞

T–Iny

⊗⎝ ⎠⎛ ⎞ C

Ym[ ] U

m[ ]⎝ ⎠⎛ ⎞

T–Iny

⊗⎝ ⎠⎛ ⎞

H…+=

U

m[ ]⎝ ⎠⎛ ⎞

T–G

m[ ]⊗⎝ ⎠

⎛ ⎞ CU

m[ ] U

m[ ]⎝ ⎠⎛ ⎞

T–G

m[ ]⊗⎝ ⎠

⎛ ⎞H

…+ +

2herm U

m[ ]⎝ ⎠⎛ ⎞

T–Iny

⊗⎝ ⎠⎛ ⎞ C

YUm[ ] U

m[ ]⎝ ⎠⎛ ⎞

T–G

m[ ]⊗⎝ ⎠

⎛ ⎞H

⎩ ⎭⎨ ⎬⎧ ⎫

–

Cnm[ ]

�nynu nynu×

∈

Cnm[ ]

U

m[ ]⎝ ⎠⎛ ⎞ T–

Iny⊗⎝ ⎠

⎛ ⎞ CY

m[ ] U

m[ ]⎝ ⎠⎛ ⎞ T–

Iny⊗⎝ ⎠

⎛ ⎞H=

M Cnm[ ]

Cn1

M2

------- Cnm[ ]

m 1=

M

∑=

CBLA M

G jωk( )m[ ]

CBLA1

M M 1–( )------------------------ vec G

m[ ]GBLA–⎝ ⎠

⎛ ⎞ vec G

m[ ]GBLA–⎝ ⎠

⎛ ⎞H

m 1=

M

∑=

Cn k( ) CNL k( )

CBLA k( ) CNL k( ) Cn k( )+=

CNL k( )

CNL k( ) CBLA k( ) Cn k( )–=


37

To conclude, for frequency we have:

Remark: When random phase multisines are used to perform FRF measurements on a

multivariable system, it is possible to make an optimal choice for the phases. Within a block of

experiments , orthogonal random phase multisines should be used whenever possible, as

they are optimal in the sense that they minimize the variance of the estimated BLA [18],[85].

When these signals are used, the condition number of the matrix in equation (2-29)

equals one. Hence, the inverse of in (2-29) is calculated in optimal numerical

conditions.

Orthonormal random phase multisines are created in the following way. First, ordinary

random phase multisines are generated: . Then, a unitary matrix is

used to define the excitation signals for the experiments. For frequency , the applied

signal is:

, (2-38)

where we omitted the index . For , the DFT matrix can for instance be used:

, (2-39)

k


• : The total sample covariance matrix of the estimate

, due to the combined effect of the measurement noise and

the stochastic nonlinear behaviour.

• : The measurement noise sample covariance matrix of the

estimate .

• : The sample covariance of the stochastic nonlinear

contributions.

GBLA jωk( )

CBLA k( )

GBLA jωk( )

Cn k( )

GBLA jωk( )

CNL k( )

m

U k( )m[ ]

U k( )m[ ]

nu

U1 … Unu, , W �

nu nu×∈

nu k

U k( )

w11U1

k( ) … w1nuU

nuk( )

… … …

wnu1U1

k( ) … wnunuU

nu

k( )

=

m[ ] W

wkl1

nu

---------expj– 2π k 1–( ) l 1–( )

nu--------------------------------------------⎝ ⎠⎛ ⎞=


38

where is the element of at position . In the case of a system with three inputs

( ), we have

. (2-40)

B. Non Periodic Data

Contrary to the case of periodic excitation signals, no simple methods are available to

distinguish between the nonlinear distortions and the noise effects when non periodic

excitations are used. Furthermore, leakage errors will be present when calculating the input

and the output spectra. However, it is still possible to characterize the combined effect of the

nonlinear distortions and the measurement noise.

First, the input and output data are split into blocks. Then, the input and output DFT

spectra are calculated per block. Again, leakage effects are diminished by means of a Hanning

or diff window [65]. The next step is to calculate the sample cross-power spectrum of the

inputs and outputs and the auto-power spectrum of the inputs using (2-24).

The Best Linear Approximation is then given by

. (2-41)

If we assume noiseless inputs, the following expression can be used to calculate the

covariance of the output spectrum:

, (2-42)

where the factor 2 stems from using a diff window. Making use of , the uncertainty of

can be derived (see Appendix 2.B):

(2-43)

wkl W k l,( )

nu 3=

W1

3-------

1 1 1

1j– 2π3

-----------⎝ ⎠⎛ ⎞exp

j2π3

--------⎝ ⎠⎛ ⎞exp

1j2π3

--------⎝ ⎠⎛ ⎞exp

j– 2π3

-----------⎝ ⎠⎛ ⎞exp

=

M nu>

SYU k( ) SUU k( )

GBLA jωk( ) SYU k( )SUU1–

k( )=

CY k( ) M2 M nu–( )------------------------ SYY k( ) GBLA jωk( )SUY k( )–( )=

CY k( )

GBLA jωk( )

CBLA k( ) 1M----- S

UU

T–k( ) CY k( )⊗=


39

To summarize, for frequency we have:

Remark: Again, a trade-off is made when choosing the number of blocks . See the SISO

case (“Non Periodic Data” on p. 32) for a discussion.

k


• : The total sample covariance matrix of the estimate

, due to the combined effect of the measurement noise and

the stochastic nonlinear behaviour.

GBLA jωk( )

CBLA k( )

GBLA jωk( )

M


40

Appendix 2.A Calculation of the FRF Covariance from the Input/Output CovariancesThe measured input and output DFT coefficients for a block of experiments are given by

(2-44)

for frequencies . and are the noiseless

Fourier coefficients; and are the contributions of all the noise sources in the

experimental set-up.

Assumption 2.11 (Disturbing Noise): The input and output errors

satisfy the following set of equations:

(2-45)

Furthermore, we assume that and are independent of and

.

The FRF estimate is given by

. (2-46)

nu

U k( ) U0 k( ) NU k( )+=

Y k( ) Y0 k( ) NY k( )+=

k 1 … F, ,= U0 k( ) �nu nu×

∈ Y0 k( ) �ny nu×

∈

NU k( ) NY k( )

NU k( ) NY k( )

vec NU k( )( ){ }� 0=

vec NY k( )( ){ }� 0=

vec NU

k( )⎝ ⎠⎛ ⎞ vec NU

k( )⎝ ⎠⎛ ⎞

H

⎩ ⎭⎨ ⎬⎧ ⎫

� CU k( )=

vec NY

k( )⎝ ⎠⎛ ⎞ vec NY

k( )⎝ ⎠⎛ ⎞

H

⎩ ⎭⎨ ⎬⎧ ⎫

� CY k( )=

vec NY

k( )⎝ ⎠⎛ ⎞ vec NU

k( )⎝ ⎠⎛ ⎞

H

⎩ ⎭⎨ ⎬⎧ ⎫

� CYU k( ) CUYH

k( )= =

NU k( ) NY k( ) U0 k( )

Y0 k( )

G jωk( )

G jωk( ) Y k( )U k( ) 1–Y0 k( ) NY

k( )+⎝ ⎠⎛ ⎞ U0

k( ) NU k( )+⎝ ⎠⎛ ⎞

1–= =


41

We will calculate the variability of the FRF estimate using a first order Taylor approximation.

For notational simplicity, we will omit the frequency index in the following calculations. First,

we isolate from :

(2-47)

Next, we apply the Taylor expansion, restricting ourselves to the first order terms. For small

, we have

. (2-48)

When we apply this to (2-47), we obtain

, (2-49)

and when we omit the second order terms in and ,

. (2-50)

We define

, (2-51)

and then rewrite (2-50):

(2-52)

Hence, we obtain

. (2-53)

In order to compute as a function of and , we will apply the

following vectorization property:

k

U01–

U1–

G Y0 NY+( ) InuNUU0

1–+⎝ ⎠

⎛ ⎞U0⎝ ⎠⎛ ⎞ 1–

Y0 NY+( )U01–

InuNUU0

1–+⎝ ⎠

⎛ ⎞ 1–= =

α

Inu

α+⎝ ⎠⎛ ⎞ 1–

Inuα–≈

G Y0 NY+( )U01–

InuNUU0

1––⎝ ⎠

⎛ ⎞=

NU NY

G Y0U01–

Y0U01–NUU0

1–– NYU0

1–+=

G0 Y0( )U01–

=

G G0 NG+ G0 G0NUU01–

– NYU01–

+= =

NG NYU01–

G0NUU01–

–=

vec NG( ) vec NU( ) vec NY( )


42

(2-54)

This results in

. (2-55)

Next, we determine the covariance matrix which is defined as

. (2-56)

By combining equations (2-45), (2-55) and (2-56), we obtain:

(2-57)

vec ABC( ) CT

A⊗( )vec B( )=

vec NG( ) U0T–

Iny⊗⎝ ⎠

⎛ ⎞ vec NY( ) U0

T–G0⊗⎝ ⎠

⎛ ⎞ vec NU( )–=

CG

CG vec NG( )vec NG( )H

⎩ ⎭⎨ ⎬⎧ ⎫

�=

CG U0T–

Iny⊗⎝ ⎠

⎛ ⎞CY U0T–

Iny⊗⎝ ⎠

⎛ ⎞HU0

T–G0⊗⎝ ⎠

⎛ ⎞CU U0

T–G0⊗⎝ ⎠

⎛ ⎞H…+ +=

2herm U0T–

Iny⊗⎝ ⎠

⎛ ⎞CYU U0

T–G0⊗⎝ ⎠

⎛ ⎞H

⎩ ⎭⎨ ⎬⎧ ⎫

–


43

Appendix 2.B Covariance of the FRF for Non Periodic DataIf we assume a noiseless input framework, then the measured input and output DFT spectra

for block are given by

(2-58)

for frequencies . and are the noiseless DFT

spectra; represents the contributions of all the noise sources and the stochastic

nonlinear behaviour in the experimental set-up.

Assumption 2.12 (Disturbing Noise): The output error satisfies the

following set of equations:

(2-59)

Furthermore, we assume that is uncorrelated with or .

The noiseless FRF estimate is given by

. (2-60)

We will calculate the variability of the FRF estimate and omit the frequency index in the

following calculations for notational simplicity. From (2-58), we have

. (2-61)

We right-multiply both sides of equation (2-61) with , and compute the summation

over block index :

m

Um[ ]

k( ) U0m[ ]

k( )=

Ym[ ]

k( ) Y0m[ ]

k( ) NYm[ ]

k( )+=

k 1 … F, ,= U0 k( ) �nu 1×

∈ Y0 k( ) �ny 1×

∈

NYm[ ]

k( )

NYm[ ]

k( )

NYm[ ]

k( )⎩ ⎭⎨ ⎬⎧ ⎫

� 0=

NYm[ ]

k( )NYm[ ]H

k( )⎩ ⎭⎨ ⎬⎧ ⎫

� CY k( )=

NYm[ ]

k( ) U0 k( ) Y0 k( )

G0 jωk( )

G0 jωk( ) SY0U0k( )SU0U0

1–k( )=

k

Ym[ ]

Y0m[ ]

NYm[ ]

+ GU0m[ ]

= =

1M-----U

0

m[ ]H

m


44

(2-62)

or when we apply (2-24)

. (2-63)

We then right-multiply (2-63) with and use (2-60):

(2-64)

Hence, we obtain

. (2-65)

In order to compute as a function of , we apply the

vectorization property

. (2-66)

This results in

. (2-67)

Next, we determine the covariance matrix which is defined as

. (2-68)

By combining equations (2-67) and (2-68) we obtain:

1M----- Y0

m[ ]U0

m[ ]H

m 1=

M

∑1M----- NY

m[ ]U0

m[ ]H

m 1=

M

∑+ G1M----- U0

m[ ]U0

m[ ]H

m 1=

M

∑=

SY0U0

1M----- NY

m[ ]U0

m[ ]H

m 1=

M

∑+ GSU0U0=

SU0U0

1–

G G0 NG+ G01M----- NY

m[ ]U0

m[ ]HSU0U0

1–

m 1=

M

∑+= =

NG1M----- NY

m[ ]U0

m[ ]HSU0U0

1–

m 1=

M

∑=

vec NG( ) vec NYm[ ]( ) NY

m[ ]=

vec ABC( ) CT

A⊗( )vec B( )=

vec NG( ) 1M----- U0

m[ ]HSU0U0

1–

⎝ ⎠⎛ ⎞

TIny

⊗⎝ ⎠⎛ ⎞NY

m[ ]

m 1=

M

∑=

CG

CG vec NG( )vec NG( )H

⎩ ⎭⎨ ⎬⎧ ⎫

�=


45

(2-69)

In order to eliminate the two sums, we make use of the independency of and for

, and of and for . We also have that is uncorrelated with

for any and . This results in

. (2-70)

Applying (2-59) together with , and taking into account the fact that

, results in

. (2-71)

Next, we make use of the Mixed-Product rule:

, (2-72)

provided that , , , have compatible matrix dimensions. We then obtain

, (2-73)

or finally, after using (2-24) and reintroducing the frequency index :

. (2-74)

CG1

M2

------- U0m[ ]H

SU0U0

1–

⎝ ⎠⎛ ⎞

TIny

⊗⎝ ⎠⎛ ⎞NY

m[ ]

m 1=

M

∑⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞

…×

⎩⎪⎨⎪⎧

�=

… U0n[ ]H

SU0U0

1–

⎝ ⎠⎛ ⎞

TIny

⊗⎝ ⎠⎛ ⎞NY

n[ ]

n 1=

M

∑⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞H

⎭⎪⎬⎪⎫

×

NY

m[ ]NY

n[ ]

m n≠ U0m[ ]

U0n[ ]

m n≠ NY

m[ ] U0n[ ]

m n

CG1

M2

------- U0m[ ]H

SU0U0

1–

⎝ ⎠⎛ ⎞T

Iny⊗⎝ ⎠

⎛ ⎞NY

m[ ]NY

m[ ]HU0

m[ ]HSU0U0

1–

⎝ ⎠⎛ ⎞TH

Iny⊗⎝ ⎠

⎛ ⎞⎩ ⎭⎨ ⎬⎧ ⎫

�

m 1=

M

∑=

SXX1–( )

HSXX

1–=

CY 1 CY⊗=

CG1

M2

------- SU0U0

T–U0

m[ ]HTIny

⊗⎝ ⎠⎛ ⎞ 1 CY

⊗⎝ ⎠⎛ ⎞ U0

m[ ]TSU0U0

T–Iny

⊗⎝ ⎠⎛ ⎞

⎩ ⎭⎨ ⎬⎧ ⎫

�

m 1=

M

∑=

A B⊗( ) C D⊗( ) AC BD⊗=

A B C D

CG1M----- SU0U0

T– 1M----- U0

m[ ]U0

m[ ]H

m 1=

M

∑⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞ T

SU0U0

T–

⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞

CY

⊗

⎩ ⎭⎪ ⎪⎨ ⎬⎪ ⎪⎧ ⎫

�=

k

CG k( ) 1M----- S

U0U0

T–k( ) CY

k( )⊗=


46

47

CHAPTER 3

FAST MEASUREMENT OF

QUANTIZATION DISTORTIONS

A measurement technique is proposed to characterize the non-

idealities of DSP algorithms which are induced by quantization effects,

overflows (fixed point), or nonlinear distortions in one single

experiment/simulation. The main idea is to apply specially designed

multisine excitations such that a distinction can be made between the

output of the ideal system and the contributions of the system’s non-

idealities. This approach allows to compare and quantify the quality of

different implementation alternatives. Applications of this method

include for instance digital filters, FFTs, and audio codecs.

Chapter 3: Fast Measurement of Quantization Distortions

48

3.1 IntroductionDigital Signal Processing (DSP) systems have the advantage of being flexible when compared

with analog circuits. However, they are prone to calculation errors, especially when a fixed

point implementation is used. These errors induce non-ideal signal contributions at the output

such as the presence of quantization noise, limit cycles, and nonlinear distortions. In order to

be sure that the design specifications are still met, it is necessary to verify the presence of

these effects and to quantify them. A simple approach uses single sine excitations to test the

system. Unfortunately, this method does not reveal all problems and requires many

experiments in order to cover the full frequency range. A more thorough approach consists in

a theoretical analysis of the Device Under Test (DUT). A good example of this is given in [50],

where the effects of coefficient quantization of Infinite Impulse Response (IIR) systems is

analysed. The main disadvantage is that for every new DUT, or for every different

implementation of the DUT, the full analysis needs to be repeated. This can be an involved

and time consuming task, as a different approach may be needed for every new DUT.

In this chapter, a method is proposed that detects and quantifies the quantization errors using

one single multisine experiment with a well chosen amplitude spectrum. First, the

measurement concept will be introduced. Then, a brief discussion of the major errors in DSP

algorithms will be given. Finally, the results will be illustrated on a number of examples.

The Multisine as a Detection Tool for Non-idealities

49

3.2 The Multisine as a Detection Tool for Non-idealitiesHere, a special kind of multisine will be used, namely a Random Odd Multisine (see “The

Multisine as a Detection Tool for Nonlinearities” on p. 26). Using this excitation signal, it is

possible to extract the BLA of the DSP system for the class of Gaussian excitations

with a fixed Power Spectral Density [62],[66]. The BLA can be obtained in two ways. The first

way is to average the measured transfer functions for different phase realizations of the input

multisine (see “Estimating the Best Linear Approximation” on p. 28). The second way is to

identify a low order parametric model from a single phase realization. In order to have a

realistic idea of the level of nonlinear distortions, the power spectrum of the excitation signal

should be chosen such that it coincides with the power spectrum of the expected input signal

of the system in later use.

To summarize, the aim of the proposed method is two-folded:

1. Extract the BLA in order to evaluate the linear behaviour of the DSP system;

2. Detect and quantify the nonlinear effects in the DSP system in order to see whether

the design specifications are met.

The main advantage of this method is that it can be used for any DSP system or algorithm, as

long as the aim is to achieve a linear operation. A possible drawback is the limitation to a

specific class of input signals. However, it must be said that the class of Gaussian signals is

not that restrictive, since, for example, most telecommunication signals fall inside this class

[4], [13].

GBLA jω( )


50

3.3 DSP ErrorsIn this section, multisines are used to quantify the quantization distortions. A fixed point

digital filter serves as an example. The advantages of a fixed point representation in DSP

systems are well-known when compared to floating point processing: they are both faster and

cheaper. However, they suffer from a number of serious drawbacks as well. Fixed point

implementations require some knowledge about the expected dynamic range of the input

signal and the intermediate signal levels. They induce a finite numerical precision and a finite

dynamic range for the internal representation of the processed samples. In contrast with a

fixed point representation, floating point arithmetic allows input numbers with a practically

unlimited dynamic range. Depending on how the numbers are quantized (e.g. ceil, floor or

round), different kinds of distortion arise in the fixed point implementation. The overflow

behaviour (saturation or two’s complement overflow) also plays an important role, because it

can lead to a higher level of distortions and even to chaotic behaviour [8].

To investigate the influence of the finite quantization and the range problems, a fourth order

Butterworth low pass filter is considered with a normalized cut-off frequency of to be

operated at a sampling rate of . The normalized, non-quantized filter coefficients

are (in Matlab notation, with a precision of five digits after the decimal point):

(3-1)

with the numerator, and the denominator coefficients.

The filter is implemented in Direct Form (DF) II [50], with 32 bit wide accumulators and a 16

bit wide data memory, including one sign bit. The fixed point representation is illustrated in

Figure 3-1. The accumulators consist of 14 integer bits and 17 fractional bits; the memory has

7 integer bits and 8 fractional bits. These settings are summarized in Table 3-1, together with

the largest and smallest numbers that can be achieved, and the least significant bit (lsb). The

lsb is nothing more than the numerical resolution.

0.2Hz

fs 1Hz=

b 0.00204 0.00814 0.01222 0.00814 0.00204=

a 0.42203 -1.00000 0.97657 -0.44510 0.07908=

b a

1 2 3

Figure 3-1. Fixed point representation:1: Sign bit - 2: Integer bits - 3: Fractional bits.

DSP Errors

51

The input signal consists of 16 phase realizations of a Random Odd Multisine, each with a

period length of 1024 samples. The random grid has excited lines between DC ( )

and 80% of the Nyquist frequency ( ). Using a block length of 4, this results in 154

excited harmonic lines. The default RMS value of the input signal is set to about 328 lsb,

which is 1% of the largest number that can be represented in the data memory. Two periods

of each phase realization are applied. Here, the first period of the output signal is discarded in

order to eliminate transients. A common way to determine the number of samples that need

to be discarded is plotting the difference between the subsequent output periods. Then, by

visual inspection the number of required transient points can be determined.

Next, the Best Linear Approximation is calculated by averaging the measured Frequency

Response Functions (FRFs) for each phase realization [56]. The total length of the input

sequence applied to the DUT is in our experiment 1024 * 2 * 16 = 32 768 samples. The

default truncation and overflow behaviour (these terms will be explained in the following

sections) are set to rounding and saturation, respectively, but we will alter these settings in

order to analyse their influence.

3.3.1 Truncation Errors of the Filter Coefficients

Numerous different truncation methods exist [34]. Since the exact implementation of the

truncation is of no importance to the proposed method, we shall consider two common

truncation methods: arithmetic rounding and flooring. Both truncation methods are depicted

in Figure 3-2. The solid black line represents the truncation characteristic, the dashed line

stands for the ideal behaviour, and the grey line shows the average behaviour of the

truncation method. The floor operation is often used since it requires the least computation

time: the least significant bits of the calculated samples are simply discarded. However, it

introduces an average offset of ½ lsb, which is not present when using the computationally

more involved rounding method.

Sign Bit Int. Bits Fr. Bits Max. Val. Min. Val. lsb

Accumulator 1 14 17 214 - 2-17 -214 2-17

Memory 1 7 8 27 - 2-8 -27 2-8

Table 3-1. DSP settings.

f 0Hz=

f 0.4Hz=


52

To inspect the quantization effect on the filter coefficients, the FRF is calculated for every

phase realization by dividing the measured output spectrum by the input spectrum at the

excited frequencies. Then, these FRFs are averaged over all the phase realizations. The

following definition is used:

(3-2)

is the measured transfer function of the DUT:

(3-3)

with and the DFT of the input and output signals, respectively:

(3-4)

The measured transfer function can now be compared with the designed one (see Figure 3-3)

for both truncation methods. Since the quantization error of the coefficients results in

displaced system poles, the realized transfer functions (full grey lines) differ from the

designed one (black line). Furthermore, the amplitude of the complex model errors (dashed

grey lines) are shown on the same plot. From Figure 3-3, we conclude that rounding should

be used in this particular example in order to implement a filter with a transfer function that is

as close as possible to the designed transfer function in the pass-band. Hence, we will

continue the analysis in the following sections with the rounded coefficients.

(a) Floor (b) Round

½ lsb

Figure 3-2. Truncation characteristics.

xd t( ) x tTs( )= t 1 2 … N, , ,=

X DFT x( )= X l( ) 1N---- xd t( )e

j2πtl–N

---------------

t 1=

N

∑=⇔

G

G kf0( )Y kf0( )U kf0( )----------------= k excited frequency lines=

U Y

U DFT u( )= Y DFT y( )=

DSP Errors

53

3.3.2 Finite Precision Distortion

The level of nonlinear distortions at the output, due to the finite precision of the arithmetic

and the storage operations, is investigated from the same data that was used to calculate .

In Figure 3-4 (a) and (b), the output spectrum is plotted as a function of the frequency for the

rounding and floor operation, respectively. At the non excited lines in these plots, crosses

represent the even contributions, and circles denote the odd contributions. Next, the

distortion level at these lines is extrapolated to the excited lines. Hence, the Signal to

Distortion Ratio (SDR) on the latter can be determined. At the even detection lines, the even

nonlinearities (e.g. ) pop up, and at the odd detection lines, the odd nonlinearities (e.g.

) are visible. From Figure 3-4 (a) and Figure 3-4 (b), it can be seen that the rounding

method only leads to odd nonlinearities, while the floor operation leads to both odd and even

nonlinearities. This behaviour can be explained by the symmetry properties of the truncation

characteristics (see Figure 3-2). The rounding operation is an odd function:

. If this function was decomposed in its Taylor series, it would only

consist of odd terms. The following properties hold for odd functions:

• the sum of two odd functions is odd;

• the multiplication of an odd function with a constant number is odd;

• the cascade of odd functions results in an odd function;

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.460

40

20

0

Best

Lin

ear

Appro

xim

ation [

dB]

Frequency [Hz]

Ideal

Round

Floor

Figure 3-3. Distortion of the transfer function due to quantized filter coefficients.

G

x2

x3

Round x–( ) Round x( )–=


54

The filtering operation performed by the DUT consists in subsequent additions and

multiplications with constant numbers (i.e., the filter coefficients). That is why at the output of

the system only odd nonlinearities are seen when rounding is used as a truncation method.

The floor operation is neither an odd, nor an even nonlinear operation. This leads to even and

odd terms when this function is decomposed in its Taylor Series. As a result, odd and even

nonlinearities are both observed in the output spectrum. Figure 3-4 (b) also shows the

presence of a DC component corresponding to the offset of ½ lsb that is introduced by the

flooring operation (see Figure 3-2).

In the pass band, we observe that for this RMS level the SDR due to the finite precision effects

is about 60 dB.

0 0.1 0.2 0.3 0.4 0.560

40

20

0

20

40O

utp

ut

Spect

rum

[dB]

Frequency [Hz]

(a) Round

Excited

Odd Detection

Even Detection

Figure 3-4. Output spectrum of a single multisine for different truncation methods.

0 0.1 0.2 0.3 0.4 0.560

40

20

0

20

40

Outp

ut

Spect

rum

[dB]

Frequency [Hz]

(b) Floor

Excited

Odd Detection

Even Detection

DSP Errors

55

3.3.3 Finite Range Distortion

Finally, the effects of the limited dynamic range of the numeric representation are discussed.

Two approaches can be used to deal with this problem. First, when no precautions are taken

and when the value of the samples exceeds the allowed dynamic range, a two’s complement

overflow will occur. The resulting transfer characteristic is given in Figure 3-5.

A second and better method to deal with the limited range is to detect and to saturate the

representation. This leads to the characteristic given in Figure 3-6.

In the following simulation experiment, the RMS value of the input signal is increased with a

factor 3, up to 983 lsb. Figure 3-7 shows heavy distortion of the transfer function in the case

of two’s complement overflow, while in the case of saturation the distortion is much lower.

In Figure 3-8 (a) and (b), the level of nonlinear distortion at the output is analysed, using the

same data as before.

Range

Figure 3-5. Transfer characteristic for two’s complement overflow.

Range

Figure 3-6. Transfer characteristic for saturation overflow.


56

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

40

30

20

10

0Best

Lin

ear

Appro

xim

ation [

dB]

Frequency [Hz]

Ideal

Saturation

Overflow

Figure 3-7. Transfer function for finite range.

0 0.1 0.2 0.3 0.4 0.5

20

0

20

40

60

Outp

ut

Spect

rum

[dB]

Frequency [Hz]

(a) Saturation

Excited

Odd Detection

Even Detection

0 0.1 0.2 0.3 0.4 0.5

20

0

20

40

60

Outp

ut

Spect

rum

[dB]

Frequency [Hz]

(b) Two’s Complement Overflow

Excited

Odd Detection

Even Detection

Figure 3-8. Output spectrum of a single multisine for different overflow methods.

DSP Errors

57

We observe the presence of a small nonlinear contribution at the even detection lines. This is

caused by the asymmetry of the fixed point representation: the largest positive number is

, while the largest negative number is . Consequently, the relation

does not hold for all (neither for the saturation, nor for the

two’s complement overflow), resulting in a nonlinear behaviour that is not strictly odd. Thus,

when rounding is used to truncate the numbers, the even distortions in the output spectrum

can only be caused by an overflow event. Hence, the even detection lines can act as a

warning flag for internal overflows.

3.3.4 Influence of the Implementation

The method presented here is also appropriate to evaluate the relative performance of

different types of implementations. In the following experiment, we compare three types of

implementations: ordinary Direct Form II, a cascade of Second Order Sections (SOS)

optimized to reduce the probability of overflow, and a cascade of SOS reducing the round-off

noise. The results for the different implementations are shown in Figure 3-9. For this

particular system and RMS value of the input signal (328 lsb), the Direct Form II system

shows the smallest quantization noise.

27 2 8–– 27–

Overflow x–( ) Overflow x( )–= x


58

0 0.1 0.2 0.3 0.4 0.5

20

0

20

40

Outp

ut

Spect

rum

[dB]

Frequency [Hz]

(a) DF II

Excited

Odd Detection

Even Detection

0 0.1 0.2 0.3 0.4 0.5

20

0

20

40

Outp

ut

Spect

rum

[dB]

Frequency [Hz]

(b) SOS Overflow

Excited

Odd Detection

Even Detection

0 0.1 0.2 0.3 0.4 0.5

20

0

20

40

Outp

ut

Spect

rum

[dB]

Frequency [Hz]

(c) SOS Round off

Excited

Odd Detection

Even Detection

Figure 3-9. Output spectrum of a single multisine for different implementations.

Quality Analysis of Audio Codecs

59

3.4 Quality Analysis of Audio CodecsThe method described above can also be applied to the encoding and decoding of music in a

compressed format, for instance MP3. Although the coding/decoding process cannot be

considered as strictly time-invariant, we can still use the multisine technique to have an idea

of the level of distortion that arises. This makes it easy to compare different codecs with

identical bit rates, or different bit rates for the same codec. Of course, the psycho-acoustic

models employed in the encoding process can only be rated through subjective listening tests;

they are not considered here. Such models are used to determine the spectral music

components that are less audible to the human ear. This effect is caused by the so-called

masking effect: the human ear is less sensitive to small spectral components residing in the

proximity of large spectral components. The codec takes advantage of this shortcoming in

order to achieve higher compression ratios [31].

Since MP3 codecs are designed to handle music, it would be more interesting to use a music

sample instead of a multisine signal. However, the benefits of even and odd detection lines

are still needed. Detection lines in a music sample can easily be achieved with the following

procedure. Consider the music sample vector and the following anti-symmetrical sequence:

, (3-5)

where denotes a set of zero samples with the same length as vector . For such a

sequence, all the frequency lines which are a multiple of 2 or 3 of the fundamental frequency

will be zero. Hence, they will serve as detection lines for the odd and even nonlinearities.

The LAME MP3 codec (version 1.30, engine 3.92 [88]) is used in this example. The input

signal is a monophonic music sequence, sampled at 44.1 kHz with 16 bits per sample. In

order to have a representative set of data, an excerpt of samples from a pop song is

used. After applying the above procedure (3-5) to create detection lines, the length of the test

sequence is increased to about samples. In order not to overload the figures, the

number of plotted points in Figure 3-10 has been reduced.

The level of distortion for three bit rates (64 kbps, 128 kbps and 256 kbps) is plotted at the

left hand side of Figure 3-10. For the 64 kbps encoding/decoding process, the distortion level

lies 30 dB below the signal level for low frequencies. The results for 128 kbps and 256 kbps

show that the distortion level decreases with about 10 dB per additional 64 kbps.

x

x x 0 x[ ] x– x– 0 x[ ]

0 x[ ] x

220

6 106⋅


60

102

103

104

40

20

0

20

40

60

80

64 k

bps

Am

plit

ude [

dB]

Distortion Level

Excited

Odd Detection

Even Detection

102

103

104

15

0

15

Phase

[º]

10

0

10

Best Linear Approximation

Am

plit

ude [

dB]

102

103

104

40

20

0

20

40

60

80

128 k

bps

Am

plit

ude [

dB]

102

103

104

15

0

15

Phase

[º]

10

0

10

Am

plit

ude [

dB]

102

103

104

40

20

0

20

40

60

80

Frequency [Hz]

256 k

bps

Am

plit

ude [

dB]

102

103

104

15

0

15

Frequency [Hz]

Phase

[º]

10

0

10

Am

plit

ude [

dB]

Figure 3-10. MP3 coding/decoding distortion and the BLA for different bit rates.

Quality Analysis of Audio Codecs

61

Furthermore, the Best Linear Approximation is computed for the three bit rates and plotted at

the right hand side of Figure 3-10. We see a flat amplitude spectrum and a zero phase, as

expected. The variations in the amplitude spectrum for the lowest bit rate are probably

caused by the effect of masking (which lines are masked is decided by the psycho-acoustic

model of the encoder). It can also be observed that in the encoding process for 64 and

128 kbps, a low pass characteristic is present (cut-off frequencies of 14 kHz and 18 kHz,

respectively). Consequently, the MP3 codec cuts off the high frequencies when low bit rates

are used.


62

3.5 ConclusionIn this chapter, we showed that it is possible to identify and quantify many non-idealities that

occur in DSP systems, using custom designed multisines. The proposed concepts allow to

verify quickly whether the input range and the quantization level are well chosen for input

signals with a certain pdf and power spectrum. We have illustrated the ideas for an IIR

system, for which the impact of the filter coefficient quantization, the presence of round-off

noise, the overflow behaviour, and the effect of the chosen implementation can easily be

measured and compared with the design specifications. Finally, the method was successfully

applied to analyse the performance of an audio compression/decompression process.

63

CHAPTER 4

IDENTIFICATION OF NONLINEAR

FEEDBACK SYSTEMS

In this chapter, a method is proposed to estimate block-oriented

models which are composed of a linear, time-invariant system and a

static nonlinearity in the feedback loop. By rearranging the model’s

structure and by imposing one delay tab for the linear system, the

identification process is reduced to a linear problem allowing a fast

estimation of the feedback parameters. The numerical parameter

values obtained by solving the linear problem are then used as

starting values for the nonlinear optimization. Finally, the proposed

method is illustrated on measurements from a physical system.

Chapter 4: Identification of Nonlinear Feedback Systems

64

4.1 Introduction

Many physical systems contain, in an implicit manner, a nonlinear feedback. Consider for

example a mass-spring-damper system with a nonlinear, hardening spring. For this system,

the differential equation describing the displacement of the mass is given by

, (4-1)

where is the input force and is the damping coefficient. The constants and

characterize the behaviour of the hardening spring ( ).

To demonstrate the implicit feedback behaviour of this system, equation (4-1) is rewritten as

follows

. (4-2)

The model structure which corresponds to this equation is shown in Figure 4-1. In this block

scheme, is the Laplace transfer function between the input signal and the output

signal of the system. The nonlinear block contains a static nonlinearity and

represents the term which is fed back in a negative way.

In the following sections, we will develop an identification procedure for this kind of Nonlinear

Feedback systems.

yc t( ) m

my··c t( ) dy·c t( ) k1yc t( ) k3yc3 t( )+ + + uc t( )=

uc t( ) d k1 k3

k3 0>

my··c t( ) dy·c t( ) k1yc t( )+ + uc t( ) k3yc3 t( )–=

G s( ) uc t( )

yc t( ) NL

k3yc3 t( )

+

-G s( )uc t( ) yc t( )+

NL

Figure 4-1. LTI system with static nonlinear feedback.

Model Structure

65

4.2 Model Structure

For a band-limited input signal (for which the power spectrum for

), the linear system can be approximated in the frequency domain by a

discrete-time model

, (4-3)

provided that the sampling frequency is sufficiently high such that no aliasing occurs. To

this matter, note that the bandwidth of is possibly higher than the bandwidth of the

input signal. This is due to the nonlinearity which is present in the feedback loop. The relation

between the discrete-time signals and and the continuous-time signals and , for a

given sampling period , is described by the following equations

(4-4)

The relation between input and output is then given by

. (4-5)

The static nonlinearity is represented by a polynomial

. (4-6)

The vector , which contains all the model parameters, is defined as

. (4-7)

The proposed model structure to identify the system is shown in Figure 4-2. The set of

difference equations that describe the input/output relation is given by

uc t( ) Suu ω( ) 0=

ω ωmax> G s( )

G z θL,( )biz

i–

i 0=

nb

∑ajz

j–

j 0=

na∑-----------------------------= and θL a b,( )=

fs

yc t( )

u y uc yc

Ts

u t( ) uc tTs( )=

y t( ) yc tTs( )=⎩⎨⎧

u t( ) y t( )

y t( ) G z θL,( )u t( )=

fNL

fNL x( ) plxl

l 0=

r

∑= and θNL p=

θ

θ θL θNL,( )=


66

(4-8)

Without losing generality, we divide the first equation by , or equivalently, set . We

then substitute the second equation of (4-8) into the first one, and get

. (4-9)

We isolate the terms in and move them to the left hand side:

. (4-10)

This equation is an example of a nonlinear algebraic loop [41]: a nonlinear algebraic equation

should be solved for every time step when the model output is calculated. In principle, this

problem can be tackled by a numeric solver. The disadvantage of this loop is that it can have

multiple solutions. In fact, it is even possible that no solution exists when the degree is

even. As an example, let us take a look at the polynomial shown in Figure 4-3. The black

curve is a polynomial of third degree; it corresponds to the left hand side of (4-10). The grey

horizontal levels represent possible values of the right hand side of (4-10). In this example,

ajy t j–( )

j 0=

na

∑ bix t i–( )

i 0=

nb

∑=

x t( ) u t( ) plyl

t( )

l 0=

r

∑–=⎩⎪⎪⎪⎨⎪⎪⎪⎧

+-

G z θL,( )u t( ) y t( )

NL θNL( )

Figure 4-2. Model structure.

x t( )+

a0 a0 1≡

y t( ) biu t i–( )

i 0=

nb

∑ ajy t j–( )

j 1=

na

∑– bi plyl t i–( )

l 0=

r

∑i 0=

nb

∑–=

yl

t( )

y k[ ] b0 plyl k[ ]

l 0=

r

∑+ biu k i–[ ]

i 0=

nb

∑ ajy k j–[ ]

j 1=

na

∑– bi plyl k i–[ ]

l 0=

r

∑i 1=

nb

∑–=

t

r

Model Structure

67

we observe that according to the horizontal level, the number of solutions varies: one solution

for the solid grey line, two solutions for the dotted line, and three solutions for the dash

dotted line. Although should be a good initial guess leading the numeric solver to the

correct solution, we prefer to avoid multiple solutions. To do this in a simple way, we will

impose one delay tab for the linear block or, equivalently, in equation (4-10) will be set to

zero. Taking into account the imposed delay, we obtain the following model equation

. (4-11)

y t 1–( )

Figure 4-3. Number of solutions of the algebraic equation.

b0

y t( ) biu t i–( )

i 1=

nb

∑ ajy t j–( )

j 1=

na

∑– bi plyl t i–( )

l 0=

r

∑i 1=

nb

∑–=


68

4.3 Estimation Procedure

A three step procedure is used to identify the parameters of the model equations (4-11). In

the first step, the parameters of the linear model are identified. Next, the coefficients of the

static nonlinearity are estimated. Finally, a nonlinear search procedure is employed in order to

refine the initial values obtained in the first two steps.

4.3.1 Best Linear Approximation

The first step in the identification process consists in estimating the Best Linear Approximation

(BLA) from the measured input and output for (see “Estimating the

Best Linear Approximation” on p. 28). A parametric linear model is then estimated in the -

domain and denoted as . Note that the linear behaviour of the nonlinear

feedback branch is implicitly included in the estimated BLA (see Figure 4-4).

4.3.2 Nonlinear Feedback

In the second step, the nonlinear block which is present in the feedback branch is identified.

To achieve this, the feedback loop is opened, and the model is restructured as shown in

Figure 4-5. In order to keep the identification simple, the measured output is used at

the input of the block. The idea of using measured outputs instead of estimated outputs

in order to avoid recurrence, is similar to the series-parallel architecture from the identification

of neural networks [47].

um t( ) ym t( ) t 1 …N,=

Z

GBLA z θL,( )

Lin NL( )

NL (NL)

+

-uc t( ) yc t( )+

-

Lin(NL)

GBLA z θL,( )

Figure 4-4. The BLA (grey box) of a Nonlinear Feedback system.

+ + G s( )

ym t( )

NL

Estimation Procedure

69

As will be shown in what follows, opening the loop allows the formulation of an estimation

problem that is linear-in-the-parameters for a fixed . From Figure 4-5, we

obtain the following equation

. (4-12)

The residual is defined as

, (4-13)

where and are the measured input and output, respectively. Note that is

independent of the parameters . Next, the error needs to be minimized:

. (4-14)

To achieve this, the least squares cost function

(4-15)

is minimized with respect to .

θNL GBLA z θL,( )

y t( ) GBLA z θL,( ) u t( ) plyl t( )

l 0=

r

∑–

⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞

=

w t( )

NL

+

-GBLA z θL,( )u t( ) y t( )

y t( )

Figure 4-5. Rearranged model structure.

+

w t( ) ym t( ) GBLA z θL,( )um t( )–≡

um t( ) ym t( ) w t( )

θNL ew t θNL,( )

ew t θNL,( ) w t( ) G– BLA z θL,( ) plyml t( )

l 0=

r

∑⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞

⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞

–=

V θNL( )

V θNL( ) ew t θNL,( )2

t 1=

N

∑=

pi


70

Since is a known linear operator, independent of , this minimization is a

problem that is linear-in-the-parameters which can be solved in the time or frequency domain.

Its solution is given by the matrix equation

, (4-16)

with

, (4-17)

and where and are vectors that contain the elements and for

, respectively. In (4-17), the power should be computed elementwise, and the

operator should be applied to all the columns. The pseudo-inverse can be

calculated in a numerical stable way via a Singular Value Decomposition (SVD) [27]. Since

measurements are used in the observation matrix of this linear least squares problem, a

bias is present on the estimated parameters [56]. However, when the Signal to Noise

Ratio (SNR) achieved by the measurement set-up is reasonable, this bias remains small,

yielding results that are well enough to initialize the nonlinear search procedure.

4.3.3 Nonlinear Optimization

The starting values obtained from the initialization procedure can be improved by solving the

full nonlinear estimation problem. The Levenberg-Marquardt algorithm (see “The Levenberg-

Marquardt Algorithm” on p. 135) is used to minimize the weighted least squares cost function

, (4-18)

where is a user-chosen, frequency domain weighting matrix. The model error

is defined as

, (4-19)

where and are the DFT of the measured and modelled output from equation

(4-11), respectively. Hence, the cost function is formulated in the frequency domain, which

GBLA z θL,( ) θNL

θNL H+w=

H G– BLA z θL,( ) ym0 ym

1 … ymr

⎝ ⎠⎛ ⎞=

w ym w t( ) ym t( )

t 1 … N, ,=

GBLA z θL,( ) H+

H

θNL

VWLS θ( ) W k( ) ε k θ,( ) 2

k 1=

F

∑=

W k( ) �∈

ε k θ,( ) �∈

ε k θ,( ) Ym k θ,( ) Y k( )–=

Ym k θ,( ) Y k( )

Estimation Procedure

71

enables the use of nonparametric weighting. Typically, the weighting matrix is chosen

equal to the inverse covariance matrix of the output . This matrix can be obtained

straightforwardly when periodic signals are used to excite the DUT.

In two different situations, leakage can appear in equation (4-19): when arbitrary excitations

are employed, or when subharmonics are present in the measured or modelled output. In the

first case, the leakage can be reduced by windowing techniques, or by increasing the length

of the data record. In the second case, it suffices to increase the DFT window length such that

an integer number of periods are measured in the window.

The Levenberg-Marquardt algorithm requires the computation of the derivatives of the model

error to the model parameters . The analytical expressions of the Jacobian are given in

Appendix 4.A. In these expressions, the modelled outputs are utilized instead of the measured

outputs. Hence, the bias which was present in the previous section due to the noise on the

output is now removed.

W k( )

CY1–

k( )

θ


72

4.4 Experimental Results

We will now apply the ideas of the previous sections to a practical measurement set-up. The

Device Under Test is an electronic circuit, also known as the Silverbox [67], which emulates

the behaviour of a mass-spring-damper system with a nonlinear spring. The experimental

data originate from a single measurement and contain two main parts. In all experiments, the

input and output signals are measured at a sampling frequency of 10 MHz/214 = 610.35 Hz.

The first part of the data is a filtered Gaussian signal with a RMS value that increases linearly

with time. This sequence consists of 40 700 samples and has a bandwidth of 200 Hz; it will be

used for validation purposes. Note that the amplitude of the validation sequence exceeds the

amplitude of the estimation sequence. A warning is in place here: generally speaking,

extrapolation during the validation test should be avoided, since it reveals no information

about the model quality. Good extrapolation performance, certainly in a black box framework,

is often a matter of luck: the model structure happens to correspond exactly to the system’s

internal structure. The second part of the data consists of ten consecutive realizations of a

random phase multisine with 8192 samples and 500 transient points per realization, depicted

in Figure 4-6 with alternating colours. The bandwidth of the excitation signal is also 200 Hz

and its RMS value is 22.3 mV. The multisines will be employed to estimate the model. In this

measurement, odd multisines were used:

(4-20)

Figure 4-6. Excitation signal that consists of a validation and an estimation set.

0 50 100 150 200

0.15

0.1

0.05

0

0.05

0.1

0.15

Input

Sig

nal [V

]

Time [s]

EstimationValidation

Uk A= k 2n 1+= n �∈

Uk 0= elsewhere

Experimental Results

73

using the same symbols as in equation (2-3). The phases were chosen uniformly

distributed over . We will extract the BLA of the DUT by averaging the measured

transfer functions for different phase realizations of the multisine [56],[62],[66]. Note that

this also diminishes the measurement noise on the BLA.

4.4.1 Linear Model

We will start with a comparison of two second order linear models. The first model has

completely unrestricted parameters. For the second model, we impose one delay tab which is

equivalent to forcing to zero during the estimation. As mentioned before, this delay is

imposed in order to avoid an algebraical loop. Furthermore, the order of the numerator is

increased to reduce the model error. The consequences of the delay are now investigated by

comparing the quality of the following models:

1. Full linear model:

(4-21)

2. Linear model with imposed delay:

(4-22)

The estimation of the parameters of and for the DUT is carried out in the

frequency domain, using the ELiS Frequency Domain Identification toolbox [56]. The resulting

models are plotted in Figure 4-7 (dash dotted and dashed grey lines, respectively), together

with the BLA (solid black line). From this figure, it can be seen that imposing a delay results in

a slight distortion of the modelled transfer function for high frequencies. The following step

consists in validating these models by using the validation data set. The simulation error

signals of the linear models are plotted in Figure 4-8 (b) and (c), together with the measured

output signal (a). From these plots, we conclude that the same model quality is achieved for

both linear models: the Root Mean Square Error (RMSE) obtained with the validation data set

is 14.3 mV. This means that imposing a delay tab does not significantly deteriorate the quality

of the linear model in this particular experimental set-up.

φk

[0 2π, )

b0

G1 z( )b0 b1z 1– b2z 2–+ +

a0 a1z 1– a2z 2–+ +----------------------------------------------=

G2 z( )b1z 1– b2z 2– b3z 3– b4z 4– b5z 5–+ + + +

a0 a1z 1– a2z 2–+ +-----------------------------------------------------------------------------------------------=

G1 z( ) G2 z( )


74

0 50 100 15020

10

0

10

20

Am

plit

ude [

dB]

0 50 100 150

150

100

50

0

Phase

[°]

Frequency [Hz]

Figure 4-7. Measured FRF (solid black line),model (dash dotted grey line) and model (dashed grey line).G1 z( ) G2 z( )

0 10 20 30 40 50 60

0.2

0

0.2

(a)

Measured Output Signal [V]

0 10 20 30 40 50 60

0.2

0

0.2

Error Linear Model 1 RMSE: 14.3 mV

(b)

0 10 20 30 40 50 60

0.2

0

0.2

Error Linear Model 2 RMSE: 14.3 mV

(c)

Time [s]

Figure 4-8. Validation of the linear models.


75

The estimated coefficients of the second linear model are

. (4-23)

We now proceed with the identification procedure using the second linear model and by

extending it with the static nonlinear feedback branch (see Figure 4-2).

4.4.2 Estimation of the Nonlinear Feedback Coefficients

After estimating the linear transfer characteristics, the nonlinear feedback coefficients are

estimated in the time domain, using the ten measured multisine realizations. Several degrees

were tried out, and yielded the best result. The identified coefficients are

. (4-24)

Again, the model is validated on the Gaussian noise sequence. Figure 4-9 (a) shows the

simulation error of the Nonlinear Feedback model. Note that the vertical scale is enlarged 10

times compared with the plots in Figure 4-8. The RMSE has dropped with more than a factor

10 compared to the linear model. Furthermore in Figure 4-9 (a), the large spikes in the error

signal have disappeared.

b 0 0.4838 0.0987– 0.1217 -0.0782 0.0257 ,= a 1 -1.4586 0.9323=

p

r r 1 : 3=

p -0.0347 -0.0260 3.9177=

0 10 20 30 40 50 60

0.02

0

0.02

Simulation error NLFB model RMSE: 1.01 mV

(a)

0 10 20 30 40 50 60

0.02

0

0.02

Simulation error NLFB model, optimized RMSE: 0.77 mV

(b)

Time [s]

Figure 4-9. Validation of the nonlinear models.


76

4.4.3 Nonlinear Optimization

The proposed identification procedure significantly enhances the results compared to the

linear model. But, we can achieve even better results by applying the nonlinear optimization

method from section 4.3.3. Since no covariance information is available from the measured

data, a constant weighting is employed. The resulting simulation error after applying the

Levenberg-Marquardt algorithm is plotted in Figure 4-9 (b). The RMSE then decreases further

with about 20% to 0.77 mV.

4.4.4 Upsampling

To obtain a further improvement of the modelling results, we will upsample the input and

output data. The idea behind upsampling is that the influence of the delay, which is imposed

artificially and which is one sample period long, is reduced. Hence, the model quality should

improve.

After upsampling the input and output data with a factor 2, the estimation procedure

described in the previous sections is applied. The simulation error of the Nonlinear Feedback

model is shown in Figure 4-10 (a). We observe indeed that the validation test yields better

results: the RMS error has decreased to 0.70 mV. In addition, a nonlinear search routine is

used to optimize the parameters, resulting in a simulation error of 0.38 mV (Figure 4-10 (b)).

The modelling results are summarized in Table 4-1. The linear models and the linear parts of

0 10 20 30 40 50 60

0.02

0

0.02

Simulation error NLFB model, P=2 RMSE: 0.7 mV

(a)

0 10 20 30 40 50 60

0.02

0

0.02

Simulation error NLFB model, P=2, optimized RMSE: 0.38 mV

(b)

Time [s]

Figure 4-10. Validation of the nonlinear models for upsampled data.


77

the Nonlinear Feedback models are all of order , ; the degree of the

polynomial is set to . From Table 4-1, we conclude that by extending the linear

model with a static nonlinear feedback, the simulation error on the validation data set is

reduced by more than 20 dB: from 14.3 mV to 1.01 mV. The nonlinear optimization slightly

improves this result down to 0.77 mV. Furthermore, using upsampling the total error

reduction increases to more than 30 dB compared to the linear model. Finally, the simulation

error of the optimized nonlinear model using the upsampled data set decreases to 0.38 mV.

For a comparison with other modelling approaches on the same DUT, using the same data

set, we refer to the Silverbox case study in Chapter 6 (see “Comparison with Other

Approaches” on p. 151).

Model RMSE Validation

Linear 14.3 mV

Linear + delay 14.3 mV

NLFB, =610 Hz 1.01 mV

NLFB, =610 Hz (optimized) 0.77 mV

NLFB, =1221 Hz 0.70 mV

NLFB, =1221 Hz (optimized) 0.38 mV

Table 4-1. Summary of the modelling results.

na 2= nb 5= r

r 1 : 3=

fs

fs

fs

fs


78

In Figure 4-11, the amplitude spectrum of the simulation errors of the various models is

shown, together with the measured output spectrum (solid black line). The solid grey line

represents the error of the (unrestricted) linear model. The grey and black dots represent the

Nonlinear Feedback model errors with and without upsampling, respectively.

0 50 100 150 200 250 30030

20

10

0

10

20

30

40Am

plit

ude [

dB]

Frequency [Hz]

Figure 4-11. Spectrum of the measured output (solid black line); linear model error (solid grey line); NLFB model error (black dots);

NLFB model error for upsampled data (grey dots).

Conclusion

79

4.5 Conclusion

The technique proposed in this chapter provides a practical and fast way to model systems

that are composed of a linear, time-invariant system and a static nonlinear feedback. The

estimated model gives satisfying modelling results, which can be further improved by applying

a nonlinear optimization procedure. We have applied the method on experimental data, and

obtained good results. The modelling error was significantly reduced to less than 3% of the

error obtained with an ordinary linear model.


80

Appendix 4.A Analytic Expressions for the JacobianRecall equation (4-11) which describes the Nonlinear Feedback model

. (4-25)

In order to use the Levenberg-Marquardt algorithm, we need to compute the derivatives of

the output to the model parameters, i.e., the Jacobian. The Jacobian elements are defined as:

(4-26)

Finally, we obtain:

(4-27)

y k( ) biu k i–( ) biplyl k i–( ) ajy k j–( )

j 1=

na

∑–

l 1=

r

∑i 1=

nb

∑+

i 1=

nb

∑=

Jbnk( ) y k( )∂

bn∂-------------≡ Jan

k( ) y k( )∂an∂

-------------≡ Jpnk( ) y k( )∂

pn∂-------------≡

Jbnk( ) u k n–( ) ply

lk n–( )

l 1=

r

∑ plbilyl 1– k i–( )Jbn

k i–( )

l 1=

r

∑i 1=

nb

∑ ajJbnk j–( )

j 1=

na

∑–+ +=

Jank( ) y k n–( )– plbily

l 1– k i–( )Jank i–( )

l 1=

r

∑i 1=

nb

∑ ajJank j–( )

j 1=

na

∑+ +=

Jpnk( ) biy

n k i–( )

i 1=

nb

∑ plbilyl 1– k i–( )Jpn

k i–( )

l 1=

r

∑i 1=

nb

∑ ajJpnk j–( )

j 1=

na

∑–+=

⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧

81

CHAPTER 5

NONLINEAR STATE SPACE

MODELLING OF MULTIVARIABLE

SYSTEMS

This chapter deals with the modelling of multivariable nonlinear

systems. We will compare a number of candidate model structures

and select the one that is most suitable for our modelling problem.

Next, the different classes of systems that can exactly be represented

by the selected model structure are discussed. Finally, an identification

procedure to determine the model parameters is presented.

Chapter 5: Nonlinear State Space Modelling of Multivariable Systems

82

5.1 IntroductionThe aim of this chapter is to model nonlinear Multiple Input, Multiple Output (MIMO) systems.

One way to achieve this is to examine the Device Under Test (DUT) thoroughly, and to build a

model using first principles: for instance, physics and chemistry laws. This process can be very

time-consuming, since it requires an exact knowledge about the system’s structure and all its

parameters. It also induces that the system should be fully understood, which is not always

feasible to achieve. Another way to tackle the modelling problem is to consider the system as

a black box. In that case, the only available information about the system is given by its

measured inputs and outputs. This approach usually means that no physical parameters or

quantities are estimated. Hence, no physical interpretation whatsoever can be given to the

model. Black box modelling implies the application of a model structure that is as flexible as

possible, since no information about the device’s internal structure is utilized. Often, this

flexibility results in a high number of parameters.

In this chapter, we will make use of discrete-time models. One of the arguments for this

choice is that, when looking to control applications, discrete-time descriptions are more

suitable since control actions are usually taken at discrete time instances. Furthermore, the

estimation of nonlinear continuous-time models is not a trivial task, and can be

computationally involved, because it may imply the calculation of time-derivatives or integrals

of sophisticated nonlinear functions of the measured signals [79]. Finally, it should be noted

that a continuous-time approach is not strictly necessary, since we are not interested in the

estimation of physical system parameters.

One of the objectives is to choose a model structure that is suitable for MIMO systems.

Hence, it is important that the common dynamics, present in the different outputs of the DUT,

are exploited in such a way that they result in a smaller number of model parameters. First, a

number of candidate model structures found in literature will be examined. Next, a specific

model structure will be selected, and the relation with some other model structures (standard

block-oriented nonlinear models, among others) will be investigated. Finally, an identification

procedure for the selected model structure will be proposed.

The Quest for a Good Model Structure

83

5.2 The Quest for a Good Model Structure

The literature regarding nonlinear system identification is vast, and the number of available

model structures is practically unlimited. In order not to re-invent the wheel, we will briefly

discuss a number of candidate model structures, and pick the one that seems most adequate.

Initially, only deterministic models are considered: the presence of any kind of noise is

ignored. First, two popular examples of input/output models are considered. Volterra and

NARX models are both satisfying from a system theoretic viewpoint, and because of their

approximation capabilities.

5.2.1 Volterra Models

An introduction to Volterra series was already given in the first chapter (see “The Volterra-

Wiener Theory” on p. 4). Since we have chosen the Volterra-Wiener approach as a framework

for the Best Linear Approximation (see “Properties of the Best Linear Approximation” on

p. 21), it is a logical first step to consider Volterra models. These models have already been

employed in many application fields [43]: in video and image enhancement, speech

processing, communication channel equalization, and compensation of loudspeaker

nonlinearities.

The main advantage of Volterra series is their conceptual simplicity, because they can be

viewed as generalized LTI descriptions. Furthermore, they are open loop models for which the

stability is easy to check and to enforce. However, in the case of a nonparametric

representation, these benefits do not outweigh one important disadvantage: when identifying

discrete-time Volterra kernels, an enormous number of kernel coefficients needs to be

identified, even for a modest kernel degree. We illustrate this with a simple example of a SISO

Volterra model. Table 5-1 shows the number of kernel samples of two Volterra

functionals for a memory length of , and samples. Note that triangular/

regular kernels were considered in order to eliminate the redundancy of the kernel

coefficients. The number of effective kernel samples is then computed using the following

binomial coefficient [43] (see also Appendix 5.A):

, (5-1)

Nkern

N 10= N 100=

NkernN n 1–+

n⎝ ⎠⎛ ⎞=


84

where is the kernel degree. Table 5-1 shows that the number of required kernel coefficients

increases dramatically for growing model degree and memory length. From this example, it is

clear that, despite the theoretical insights they provide, nonparametric Volterra functionals are

not useful in practical identification situations. However, the combinatorial growth of

can be tackled in various ways. For instance, a frequency domain IIR representation can be

employed for the kernels [61]. But, in a black box framework such a parametrization is not

straightforward, and poses a difficult model selection problem. Another solution is to apply

interpolation methods to approximate the kernels in the time or frequency domain. This

approach works well when the kernels exhibit a certain smoothness, see for instance [48].

The application of this idea leads to a significant decrease in parameters and, thus,

measurement time.

A multivariable Volterra model is, by definition, composed of different MISO models. When

these models are parametrized independently, no advantage is taken of the common

dynamics that appear in the different outputs. Consequently, such a representation does not

satisfy our needs, since we are looking for a more parsimonious model structure.

5.2.2 NARX Approach

To avoid the excessive numbers of parameters, NARX (Nonlinear AutoRegressive model with

eXogeneous inputs) models were intensively studied in the eighties, as an alternative for

Volterra series. The generic single input, single output NARX model is defined as

, (5-2)

where is an arbitrary nonlinear function of delayed inputs and outputs [7]. Contrary to

the Volterra approach, the model output is now also a function of delayed output

Degree

1 10 1002 55 50503 220 171 7004 715 4 421 275

Table 5-1. Number of kernel coefficients in a nonparametric representation,

as a function of the kernel degree and the memory length .

n N 10= N 100=

Nkern

n N

n

Nkern

ny

y t( ) f y t 1–( ) … y t ny–( ) u t 1–( ) … u t nu–( ), , , , ,( )=

f .( )

y t( )


85

samples. This is very similar to the extension of linear FIR models to IIR models. This has two

important consequences. First of all, a longer memory length is achieved without suffering

from the dramatic increase in the number of parameters, as was observed with nonparametric

Volterra series. Secondly, due to the nonlinear feedback which is present, the stability analysis

becomes very difficult compared with Volterra models. This is the major price that is paid for

the increase in flexibility.

In [37], it is proven that a nonlinear discrete-time, time-invariant system can always be

represented by a general NARX model in a region around an equilibrium point, when it is

subject to two sufficient conditions:

• the response function of the system is finitely realizable (i.e., the state

space representation has a finite number of states);

• a linearised model exists when the system operates close to the chosen

equilibrium point.

Often, a more specific kind of NARX models is employed: polynomial NARX models. By

applying the Stone-Weierstrass Theorem [16], it was shown in [7] that these models can

approximate any sampled nonlinear system arbitrarily well, under the assumption that the

space of input and output signals is compact (i.e., bounded and closed).

The NARX model can also be used to handle multivariable systems. Just as with Volterra

models, multivariable NARX models are defined as a set of MISO NARX models:

(5-3)

with the output index, the number of inputs, the delay per output ,

and the maximum delay [37]. For this general nonlinear model, there is

no straightforward way to parametrize the functions , such that advantage is taken of the

yi t n+( ) fi y1 t n1 1–+( ) … y1 t( ) , , ,[=

…yny

t nny1–+( ) yny

t nny2–+( ) … yp t( ) , , , ,

u1 t n+( ) u1 t n 1–+( ) … u1 t( ) , , , ,

…unu

t n+( ) unut n 1–+( ) … unu

t( ), , , ]

i 1 … ny, ,= nu ni yi

n max n1 … np, ,( )=

fi


86

common dynamics present in the different outputs. Hence, we will investigate another class of

models than input/output models, namely state space models. It will turn out that the latter

are a very suitable description for multiple input, multiple output systems.

5.2.3 State Space Models

The most natural way to represent systems with multiple inputs and outputs is to use the

state space framework. In its most general form, a -th order discrete-time state space

model is expressed as

(5-4)

In these equations, is the vector that contains the input values at time

instance , and is the vector of the outputs. The state vector

represents the memory of the system, and contains the common dynamics present in the

different outputs. The use of this intermediary variable constitutes the essential difference

between state space and input/output models. For the latter, the memory is created by

utilizing delayed inputs or outputs. The first equation of (5-4), referred to as the state

equation, describes the evolution of the state as a function of the input and the previous

state. The second equation of (5-4) is called the output equation. It relates the system output

with the state and the input. Furthermore, the state space representation is not unique. By

means of a similarity transform, the model equations (5-4) can be converted into a new model

that exhibits exactly the same input/output behaviour. The similarity transform

with an arbitrary non-singular square matrix yields

(5-5)

Note that when the similarity transform is applied to arbitrary functions and , the resulting

functions and do not necessarily have the same form as and . This is illustrated in

the following example. Consider the output equation

, (5-6)

na

x t 1+( ) f x t( ) u t( ),( )=

y t( ) g x t( ) u t( ),( )=⎩⎨⎧

u t( ) �nu∈ nu

t y t( ) �∈ny

ny x t( ) �na∈

xT t( ) T 1– x t( )= T

xT t 1+( ) T 1– f TxT t( ) u t( ),( ) fT xT t( ) u t( ),( )= =

y t( ) g TxT t( ) u t( ),( ) gT xT t( ) u t( ),( )= =⎩⎨⎧

f g

fT gT f g

y t( ) ax1 t( )------------ bx2 t( )+=


87

where and are the model parameters. When we define as the -th element of the

matrix , the similarity transform results in

, (5-7)

which obviously cannot be written in the form

. (5-8)

Hence, the similarity transform can have an influence on the model complexity. Whether the

similarity transform introduces redundancy in the representation, depends on the (fixed)

parametrization of and . However, this issue is not so important to us: all the state space

models discussed in this chapter retain their model structure under a similarity transform. In

what follows, we assume that and , such that is an

equilibrium state. In the following section, we describe different kinds of state space models.

A. Linear State Space Models

The model equations of the well-known linear state space model are given by

(5-9)

with the state space matrices , , , and .

The transfer function that corresponds to (5-9) is given by

, (5-10)

with the identity matrix of dimension . From (5-10), it is clear that the poles of

are given by the eigenvalues of . By means of a similarity transform, the set of state space

matrices can be converted into a new set that exhibits exactly the same

input/output behaviour. The similarity transform with an arbitrary non-

singular square matrix yields

a b tij i j,( )

T 1–

y t( ) at11x

1Tt( ) t12x

2Tt( )+

---------------------------------------------------- b t21x1T

t( ) t22x2T

t( )+( )+=

y t( )aT

x1T t( )--------------- bTx2T t( )+=

f g

f 0 0,( ) 0= h 0 0,( ) 0= x 0=

x t 1+( ) Ax t( ) Bu t( )+=

y t( ) Cx t( ) Du t( )+=⎩⎨⎧

A �na na×

∈ B �na nu×

∈ C �ny na×

∈ D �ny nu×

∈

G z( )

G z( ) C zInaA–( ) 1–

B D+=

Inana G z( )

A

ABCD ATBTCTDT

xT t( ) T 1– x t( )=

T


88

(5-11)

It is easily verified that the similarity transform has no influence on the transfer function:

(5-12)

B. Bilinear State Space Models

A continuous-time, bilinear state space model is defined as

(5-13)

where , , , , and are the

bilinear state space matrices. These models are a straightforward extension of linear state

space models, which enables them to cope with nonlinear systems. It was shown in the past

that, in continuous-time, these models are universal approximators for nonlinear systems

[26],[46]: any continuous causal functional can be approximated arbitrarily well by a

continuous-time, bilinear state space model within a bounded time interval. The discrete-time,

bilinear state space model is given by

(5-14)

Intuitively, it is expected that this model preserves the approximation capabilities of its

continuous-time counterpart. Unfortunately, this is not the case: it is not possible to

approximate all (nonlinear) discrete-time systems by discrete-time, bilinear models. The

reason for this is that the set of discrete-time, bilinear systems is not closed with respect to

the product operation: the product of the outputs of two discrete-time, bilinear state space

systems is not necessarily a bilinear system again [26]. In order to maintain the universal

AT T 1– AT= BT T 1– B= CT CT= DT D=

GT z( ) CT zInaAT–( ) 1–

BT DT+=

CT zInaT 1– AT–( )

1–T 1– B D+=

C zTT 1– TT 1– ATT 1––( )1–B D+=

G z( )=

xc t( )d

td-------------- Axc t( ) Buc t( ) Fxc t( ) uc t( )⊗+ +=

yc t( ) Cxc t( ) Duc t( )+=⎩⎪⎨⎪⎧

A �na na×

∈ B �na nu×

∈ C �ny na×

∈ D �ny nu×

∈ F �na nun

a×

∈

x t 1+( ) Ax t( ) Bu t( ) Fx t( ) u t( )⊗+ +=

y t( ) Cx t( ) Du t( )+=⎩⎨⎧


89

approximation property also for discrete-time systems, a more generic class of models needs

to be defined: state affine models.

C. State Affine Models

A single input, single output, state affine model of degree is defined as

(5-15)

with , , , and . These models were

introduced in [73], and they pop up in a natural way in the description of sampled continuous-

time, bilinear systems [7],[59]. On a finite time interval and for bounded inputs, they can

approximate any continuous, discrete-time system arbitrarily well in uniform sense [26]. Just

as in the case of bilinear models, the states appear in the state and output equations in

an affine way. Hence, such a model structure enables the use of subspace identification

techniques to estimate the state space matrices [80].

D. Other Kinds of State Space Models

In literature, state space models come in many different flavours. In this section, we give a

non-exhaustive list of various existing approaches, and mention their most remarkable

properties.

The idea behind Linear Parameter Varying (LPV) models [80] is to create a linear, time-variant

model. Its parameters are a function of a user-chosen vector which characterizes

the operating point of the system, and which is assumed to be measurable. The state space

equations are an affine function of :

r

x t 1+( ) Aiui

t( )x t( )i 0=

r 1–

∑ Biui

t( )i 1=

r

∑+=

y t( ) Ciui

t( )x t( )i 0=

r 1–

∑ Diui

t( )i 1=

r

∑+=⎩⎪⎪⎨⎪⎪⎧

Ai �na na×

∈ Bi �na nu×

∈ Ci �ny na×

∈ Di �ny nu×

∈

x t( )

p t( ) �s∈

p t( )


90

(5-16)

with , and . The other state space matrices , , and

are partitioned in a similar way. Two special cases of the LPV model are the bilinear and the

state affine model. When is chosen equal to , and , , and

for , then the model equations become identical to the bilinear model equations

(5-14). The state affine description of degree is obtained from the LPV model by choosing

equal to a vector that contains all distinct nonlinear combinations of up to degree

. The LPV model structure is particularly interesting for nonlinear control: it enables the

use of different linear controllers at different operating points (i.e., gain scheduling).

Another kind of state space models are the so-called Local Linear Models (LLM) [80]. The idea

here is to partition the input space and the state space into operating regions in which a

particular linear model dominates. The state space matrices are defined as a sum of weighted

local linear models:

(5-17)

The scalar weighting functions generally have local support, like for instance radial basis

functions. The scheduling vector is a function of the input and the state .

The last type of nonlinear state space models discussed here are deterministic Neural State

Space models [77],[78]. The general nonlinear equations in (5-4) are parametrized by multi-

layer feedforward neural networks with hyperbolic tangents as activation functions:

(5-18)

x t 1+( ) A x t( )p t( ) x t( )⊗

B u t( )p t( ) u t( )⊗

+=

y t( ) C x t( )p t( ) x t( )⊗

D u t( )p t( ) u t( )⊗

+=⎩⎪⎪⎨⎪⎪⎧

A A0 A1 … As= Ai �na na×

∈ B C D

p t( ) u t( ) Bi 0= Ci 0= Di 0=

i 1 … s, ,=

r

p t( ) u t( )

r 1–

x t 1+( ) pi φt( ) Aix t( ) Biu t( ) Oi+ +( )i 1=

s

∑=

y t( ) pi φt( ) Cix t( ) Diu t( ) Pi+ +( )i 1=

s

∑=⎩⎪⎪⎨⎪⎪⎧

pi .( )

φt u t( ) x t( )

x t 1+( ) WABtanh VAx t( ) VBu t( ) βAB+ +( )=

y t( ) WCDtanh VCx t( ) VDu t( ) βCD+ +( )=⎩⎨⎧


91

The model in (5-18) can be viewed as a multi-layer recurrent neural network with one hidden

layer. It is also a specific kind of NLq system, for which sufficient conditions for global

asymptotic stability were derived in [78]. Furthermore, the NLq theory allows to check and to

ensure the global asymptotic stability of neural control loops.


92

5.3 Polynomial Nonlinear State Space Models

The approach in this thesis consists in starting from the general model

(5-19)

and to apply a functional expansion of the functions and . For this, we need to

choose a set of basis functions out of the many possibilities: sigmoid functions, wavelets,

radial basis functions, polynomials, hyperbolic tangents,... We opted for a polynomial

approach. The main advantage of polynomial basis functions is that they are straightforward

to compute, and easy to apply in a multivariable framework. We propose the following

notation for the Polynomial NonLinear State Space (PNLSS) model:

(5-20)

The coefficients of the linear terms in and are given by the matrices

and in the state equation, and and in the output

equation. The vectors and contain monomials in and ; the

matrices and contain the coefficients associated with those

monomials. The separation between the linear and the nonlinear terms in (5-20) is of no

importance for the behaviour of the model. However, later on in the identification procedure

this distinction will turn out to be very practical. The reason for this is that the first stage of

the identification procedure consists of estimating a linear model.

First, we briefly summarize the multinomial expansion theorem and the graded lexicographic

order, which are both useful concepts when dealing with multivariable monomials.

5.3.1 Multinomial Expansion Theorem

In order to denote monomials in an uncomplicated way, we first define the -dimensional

multi-index which contains the powers of a multivariable monomial:

x t 1+( ) f x t( ) u t( ) θ, ,( )=

y t( ) g x t( ) u t( ) θ, ,( )=⎩⎨⎧

f .( ) g .( )

x t 1+( ) Ax t( ) Bu t( ) Eζ t( )+ +=

y t( ) Cx t( ) Du t( ) Fη t( )+ +=⎩⎨⎧

x t( ) u t( ) A �∈na na×

B �na nu×

∈ C �ny na×

∈ D �ny nu×

∈

ζ t( ) �∈nζ η t( ) �∈

nηx t( ) u t( )

E �∈na nζ×

F �ny nη×

∈

n

α

Polynomial Nonlinear State Space Models

93

, (5-21)

with . A monomial composed of the components from the vector is then

simply written as

, (5-22)

where is the -th component of . The total degree of the monomial is given by

, (5-23)

and the factorial function of the multi-index is defined as

. (5-24)

Furthermore, we define as the column vector of all the distinct monomials of degree

(i.e., with multi-index ) composed from the elements of vector . The number of

elements in vector is given by the following binomial coefficient (see Appendix 5.A):

. (5-25)

Finally, the vector is defined as the column vector containing all the monomials of

degree two up to degree . The length of this vector is given by

. (5-26)

The notations introduced above can now be used to express the multinomial expansion

theorem [83].

α α1 α2 … αn=

αi �∈ ξ �n∈

ξα ξi

αi

i 1=

n

∏=

ξi i ξ

α αii 1=

n

∑=

α

α! α1!α2!…αn!=

ξ r( ) r

α r= ξ

ξ r( )

n r 1–+r⎝ ⎠

⎛ ⎞

ξ r{ }r

Ln r,n r+

r⎝ ⎠⎛ ⎞ 1– n–=


94

Theorem 5.1 (Multinomial Expansion Theorem) The multinomial expansion

theorem gives an expression for the power of a sum, as a function of the powers of the

terms:

(5-27)

5.3.2 Graded Lexicographic Order

To assemble the monomials in a deterministic way, it is convenient to define an order of

succession among the monomials. A possible choice is to utilize the lexicographic (or

alphabetical) order. For this, a sequence is chosen between the symbols

. The most trivial choice is to base the ordering on the index of :

(5-28)

When the symbols are combined into a word, we arrange them according to this order. For

instance, the disordered monomial should be written as . Furthermore,

monomials of the same degree can be ordered like words in a dictionary, e.g.

. (5-29)

Monomials of different degrees are placed in groups of increasing degree. Within each

degree, the lexicographic order is applied. This results in the so-called graded lexicographic

order. We will use this to order the elements of the vector , which contains all monomials

with a degree between two and . For instance, the vector with denotes

(5-30)

5.3.3 Approximation Behaviour

When a full polynomial expansion of (5-19) is carried out, all monomials up to a chosen

degree must be taken into account. First, we define as the concatenation of the state

vector and the input vector:

ξii 1=

n

∑⎝ ⎠⎛ ⎞ k k!

α!-----ξα

α k=∑=

ξ1{ } ξ2{ } … ξn{ }, , , i ξi

ξ1{ } ξ2{ } … ξn{ }< < <

ξi

ξ3ξ12ξ2 ξ1ξ1ξ2ξ3{ }

ξ1ξ1ξ2{ } ξ1ξ1ξ3{ } ξ2ξ2ξ3{ }< <

ξ r{ }r ξ 3{ } n 2=

ξ 3{ } ξ 2( ) ξ 3( );=

ξ12 ξ1ξ2 ξ2

2 ξ1

3 ξ1

2ξ2 ξ1ξ22 ξ2

3

T=

r ξ t( )


95

. (5-31)

As a consequence, the dimension of the vector is given by . Then, we

define and in equation (5-20) as

. (5-32)

This is our default choice for the PNLSS model structure. The total number of parameters

required by the model in (5-20) is given by

. (5-33)

When all the nonlinear combinations of the states are present in and for a given

degree, then the proposed model structure is invariant under a similarity transform. Since the

elements of the transform matrix can be chosen freely provided that is non singular,

the effective number of parameters becomes

. (5-34)

A. The PNLSS Approach versus State Affine Models

The question we want to answer is what the approximation properties are of this model

structure. When taking a closer look at (5-15) (“State Affine Models” on p. 89), we observe

that the State Affine (SA) representation forms a subclass of the default PNLSS model

structure. Therefore, the PNLSS model structure inherits its approximation properties from the

state affine framework. The remaining question is then what the additional advantage is of

the PNLSS approach over the state affine representation. To investigate this, we recapitulate a

derivation given in [59] for a SISO system. This derivation starts from a polynomial expansion

of degree of the general state space equations (5-19). This expansion is expressed by

Kronecker products:

ξ t( ) x1 t( ) … xnat( ) u1 t( ) … unu

t( )T

=

ξ t( ) n na nu+=

ζ t( ) η t( )

ζ t( ) η t( ) ξ t( ) r{ }= =

n r +

r⎝ ⎠⎜ ⎟⎛ ⎞

1–⎝ ⎠⎜ ⎟⎛ ⎞

na ny+( )na nu r+ +

r⎝ ⎠⎜ ⎟⎛ ⎞

1–⎝ ⎠⎜ ⎟⎛ ⎞

na ny+( )=

ζ t( ) η t( )

na2 T T

na nu r+ +

r⎝ ⎠⎜ ⎟⎛ ⎞

1–⎝ ⎠⎜ ⎟⎛ ⎞

na ny+( ) na2–

2r


96

(5-35)

where and are the Taylor series coefficients of and , respectively, and is

defined as follows:

(5-36)

The notation with Kronecker products is more elegant than the one we proposed in (5-20),

but it has the disadvantage that redundant monomials are present in the vector .

However, for the ideas developed here the Kronecker notation is well suited.

Next, difference equations are developed for . For instance, for we

have

(5-37)

Still following the calculations in [59] and using implicit summation, this results in a difference

equation of the form

. (5-38)

We apply the same procedure for , and so on. Furthermore, a new state vector is

defined:

x t 1+( ) Fijxi( )

t( )ujt( )

j 0=

r

∑i 0=

r

∑= with F00 0=

y t( ) Gijxi( )

t( )ujt( )

j 0=

r

∑i 0=

r

∑= with G00 0=⎩⎪⎪⎨⎪⎪⎧

Fij Gij f g x i( ) t( )

x 2( ) t( ) x t( ) x t( )⊗=

…

x r( ) t( ) x t( ) … x t( )⊗ ⊗=

x i( ) t( )

x i( ) t 1+( ) x 2( ) t 1+( )

x 2( ) t 1+( ) x t 1+( ) x t 1+( )⊗=

Fijxi( ) t( )uj t( )

j 0=

r

∑i 0=

r

∑⎝ ⎠⎛ ⎞ Fijx

i( ) t( )uj t( )j 0=

r

∑i 0=

r

∑⎝ ⎠⎛ ⎞⊗=

x 2( ) t 1+( ) Fkm Fqn⊗k q+ i=

m n+ j=

∑ x i( ) t( )uj t( )i j 0≥,∑=

x 3( ) t 1+( )


97

. (5-39)

Note that this state vector is non minimal for , because it contains identical elements

due to the redundant monomials of the Kronecker representation. Finally, the terms in (5-38)

with a nonlinear degree greater than (i.e., the terms for which ) are neglected. This

implies that the approximation of the system is actually of degree . The following state affine

model is then obtained:

(5-40)

B. Comparison of the Number of Parameters

In (5-39), we observe that the number of states in the state affine approximation grows

combinatorially with the degree of approximation . This is the price to be paid for the state

affine representation. To calculate the number of required states, the redundant states that

originate from the use of the Kronecker product, need to be taken into account. The total

number of distinct states of model (5-40) is:

(5-41)

For a SISO state affine model, there are matrix coefficients per set of state

affine matrices , and in total there are such sets. We also have to take into

account the similarity transform. Hence, the actual number of parameters is given by

. (5-42)

We will now compare this to the number of parameters that are required for a PNLSS

approximation of degree . In Figure 5-1, the ratio between expressions (5-42) (state affine

x t( )x

1( )t( )

…

xr( )

t( )

=

na 1>

r i j+ r>

r

x t 1+( ) Aiui

t( )x t( )i 0=

r 1–

∑ Biui

t( )i 1=

r

∑+=

y t( ) Ciui

t( )xt( )

i 0=

r 1–

∑ Diui

t( )i 1=

r

∑+=⎩⎪⎪⎨⎪⎪⎧

r

n

n

na r+

r⎝ ⎠⎜ ⎟⎛ ⎞

1–=

n2 2n 1+ +

AiBiCiDi r

r n2 2n 1+ +( ) n

2–

r


98

approach) and (5-33) (PNLSS approach) is shown for various system orders ( ),

and for different degrees of approximation ( ). For (red line), the ratio is

one. This is a natural result since it corresponds to a linear model which has the same order

for both approximations. For , it can be seen from Figure 5-1 that the ratio is always

higher than one.

C. Conclusion

We have shown that, for an approximation of the same quality, the PNLSS model always

requires a lower number of parameters than the state affine model. For this reason, we prefer

to use the PNLSS model structure over the state affine one.

5.3.4 Stability

The only recursive relation which is present in the general state space model (5-19) is the

state equation. Hence, the stability of the model only depends on the function , the initial

conditions of the state, and the properties of the input signal. Therefore, when analysing the

stability of (5-19), it suffices to study the following equation

, (5-43)

with . The concept of Input-to-State Stability (ISS) reflects this idea. It was

introduced in [74] for continuous-time systems, and extended to the discrete-time case in

na 1 … 10, ,=

r 1 … 5, ,= r 1=

r 1>

2 4 6 8 10

100

101

102

103

Order na of the approximated system

Ratio n

um

. par.

SA/P

NLSS

r=1

r=2

r=3

r=4

r=5

Figure 5-1. Ratio of the number of parameters for the SA approximation and the PNLSS approximation, for different degrees .r

f

x t 1+( ) f x t( ) u t( ),( )=

x 0( ) x0=


99

[32]. In order to define ISS, the following notations and definitions are used: denotes the

set of all non negative integers. The set of all input functions with the norm

is denoted by , and is the Euclidean norm. The initial

state is given by .

Definition 5.2 A function is a -function if it is continuous, strictly

increasing and if . A function is a -function if for

each fixed , the function is a -function, and if for each fixed , the

function is decreasing, and if as .

Definition 5.3 (Input-to-State-Stability) System (5-43) is globally ISS, if there

exist a -function and a -function , such that for each input and each

it holds that

(5-44)

for each .

Loosely explained, a system is ISS if every state trajectory corresponding to a bounded input

remains bounded, and if the trajectory eventually becomes small when the input signal

becomes small as well, independent of the initial state. In this thesis, we will not try to find

such functions and .

5.3.5 Some Remarks on the Polynomial Approach

A. Orthogonal Polynomials

The question addressed here is whether the use of orthogonal polynomials can create some

additional value to the identification of the PNLSS model. Orthogonal polynomials have proved

their usefulness in times when computing power and memory were scarce. For linear

problems, the orthogonality of regressors induces two advantages. First of all, the re-

estimation of parameters is circumvented when new regressors are added to an already

solved problem. Unfortunately, this asset is of no use here, since the identification of the

proposed model (5-20) requires solving a nonlinear problem (see “Identification of the PNLSS

Model” on p. 115). Secondly, orthogonality can improve the numerical conditioning. To this

�

u: � �m→

u sup u t( ) : t �∈{ } ∞<= l∞m .

x 0( ) x0=

γ: ��0 ��0→ �

γ 0( ) 0= β: ��0 �× ��0→ ��

t 0≥ β . t,( ) � s 0≥

β s .,( ) β s t,( ) 0→ t ∞→

�� β � γ u l∞m∈

x0 �na∈

x t x0 u, ,( ) β x0 t,( ) γ u( )+≤

t �∈

β γ


100

matter, it should be noted that orthogonal basis functions only offer a clear advantage when

they are applied to signals with a given probability density function. For instance, Hermite

polynomials and Chebyshev polynomials are well suited for Gaussian and uniformly distributed

signals, respectively. Bearing in mind the class of excitation signals we chose in Chapter 2, it

would be logical to select Hermite polynomials. However, the states, which are polynomial

functions of the input and the previous states, do not necessarily have a Gaussian distribution.

Therefore, we will employ ordinary polynomials, since no clear advantage can be taken from

the application of orthogonal polynomials.

B. Disadvantages of the Polynomial Approach

A number of drawbacks come along with the application of polynomials. The most important

one is the explosive behaviour of polynomials outside the region in which they were

estimated. Indeed, a polynomial quickly attains large numerical values when its arguments

are large. At first sight, this might seem a serious drawback compared to the well-behaved

extrapolation of basis functions, which tend to a constant value for large input values.

However, it should be noted that, in general, it is never a good idea to extrapolate with an

estimated model. This fact is independent of the chosen basis functions, whether they are

polynomials, hyperbolic tangents, radial basis functions, or sigmoids. The only exception to

this rule is when there exists an exact match between the DUT’s internal structure and the

model structure. In a black box framework, this is seldom the case. We illustrate this rule of

thumb by means of a short simulation, where we use two kinds of basis functions to

approximate the relation

. (5-45)

To generate the estimation data set, an input signal of 2000 samples, uniformly distributed

between and is used. First, we estimate a 15th degree polynomial using

linear least squares. Then, a Gaussian Radial Basis Function (RBF) network with 8 centres is

estimated with the RBF Matlab toolbox [51]. Both models require the estimation of 16

parameters. Next, we evaluate the extrapolation behaviour of both models by applying an

input between and . The result of this test is shown in Figure 5-2. The top

plot shows the original function (solid black line), together with the polynomial approximation

(dashed grey line) and the RBF approximation (solid grey line). The bottom plot shows the

error of both approximations on a logarithmic scale. Although the output of the RBF model

y atan u( )=

u 5–= u 5=

u 10–= u 10=


101

does not explode like with the polynomial approach (as a matter of fact, it converges to zero

for ), it still exhibits severe extrapolation errors close to the estimation region. We

conclude that the use of ‘well-behaved’ basis functions like RBFs, hyperbolic tangents, or

sigmoids is no justification to employ them for extrapolation.

x ∞±→

10 5 0 5 10

2

0

2

y

10 5 0 5 10

50

0

50

100

Err

or

[dB]

u

Figure 5-2. Top: arctangent function (solid black line);polynomial approximation (dash dotted grey line);

Gaussian RBF approximation (solid grey line).Bottom: Model error for both approximations.


102

5.4 On the Equivalence with some Block-oriented Models

The past years, simple block-oriented models have been utilized extensively to model

nonlinear systems. The block-oriented models most commonly used are the Hammerstein,

Wiener, Wiener-Hammerstein, and the Nonlinear Feedback model. Numerous applications of

these models in various fields can be found in literature: the modelling of heat exchangers

[10], transmission lines [58], chemical processes [49], and biological systems [30].

Furthermore, various identification methods for block-oriented models exist [2],[3],[11]. In

the following sections, we will establish a link between the Polynomial NonLinear State Space

(PNLSS) model and a number of standard block-oriented models. We will restrict ourselves to

SISO systems, because there exists no rigorous definition for MIMO block-oriented models:

when the number of system inputs is different from the number of outputs, it is not clear

which dimensions the intermediate signals (i.e., the signals between the blocks) should have.

Furthermore, a distinction will be made between the Hammerstein, Wiener, and Wiener-

Hammerstein system. The first two systems can be considered as a special case of the

Wiener-Hammerstein system, but making a distinction will render the analysis more simple

and interpretable.

5.4.1 Hammerstein

A Hammerstein system consists of a static nonlinearity followed by a linear dynamic system

(see Figure 5-3). A typical example where this model is utilized is in the case of a non-ideal

sensor exhibiting a static nonlinear effect, which is followed by a transmission line showing a

linear dynamic behaviour.

In general, the input signal is distorted by a static nonlinearity , resulting in the

intermediate signal which is filtered by a linear system . The linear system is

uv

y

G0 z( )P

Figure 5-3. Hammerstein system.

u P

v G0 z( )

On the Equivalence with some Block-oriented Models

103

parametrized as a -th order linear state space model with parameters { }. For

the parametrization of the static nonlinearity, we rely on the Weierstrass theorem.

Theorem 5.4 (Weierstrass Approximation Theorem) Let be a continuous

function on a closed interval . Then, given any , there exists a polynomial

of degree such that

(5-46)

for all in .

In other words, a continuous function on a closed interval can be uniformly approximated by

polynomials [35]. Hence, the static nonlinearity in Figure 5-3 is parametrized as a polynomial

with coefficients .

The following equations describe the Hammerstein system:

(5-47)

(5-48)

The substitution of (5-47) in (5-48) results in a set of equations identical to (5-20), when we

define the system matrices in (5-20) as

(5-49)

and the vectors of monomials as

. (5-50)

na A0B0C0D0

f

a b[ , ] ε 0> P

r

f x( ) P x( )– ε<

x a b[ , ]

pi

v t( ) piui

t( )i 1=

r

∑=

x0 t 1+( ) A0x0 t( ) B0v t( )+=

y t( ) C0x0 t( ) D0v t( )+=⎩⎨⎧

A A0= B p1B0

= C C0= D p1D0=

E p2B0 … prB

0= F p2D

0 … prD

0=

ζ t( ) η t( ) u t( ) r{ }= =


104

For the Hammerstein system, and are a subset of the polynomial vector functions

defined in (5-32). Therefore, we can conclude that a Hammerstein system with a continuous

nonlinearity can be represented by the PNLSS model in (5-20).

5.4.2 Wiener

A Wiener system is composed of a linear dynamic block followed by a static nonlinear

block, as shown in Figure 5-4.

The equations that describe the behaviour of a Wiener system are

(5-51)

(5-52)

By substituting the second equation of (5-51) in (5-52), and by applying the Multinomial

Expansion (see “Multinomial Expansion Theorem” on p. 92), we find the following set of

system matrices:

(5-53)

and

. (5-54)

ζ t( ) η t( )

G0 z( )

uv

y

G0 z( ) P

Figure 5-4. Wiener system.

x0 t 1+( ) A0x0 t( ) B0u t( )+=

v t( ) C0x0 t( ) D0u t( )+=⎩⎨⎧

y t( ) pivi

t( )i 1=

r

∑=

A A0= B B0= C p1C0

= D p1D0= E 0=

F p2C02

1( ) 2p2C0

1( )C0 2( ) … rprC0

na( )D0r 1–

prD0r=

ζ t( ) 0= η t( ) ξ t( ) r{ }=


105

A similar conclusion as for the Hammerstein systems can be drawn: and in (5-54)

are a subset of the polynomial vector functions defined in (5-32). Hence, Wiener systems with

a continuous static nonlinearity can be represented using the PNLSS approach.

5.4.3 Wiener-Hammerstein

Wiener-Hammerstein systems are defined as a static nonlinear block sandwiched between

two linear dynamic blocks and with orders and , respectively (see

Figure 5-5). The intermediate signals are denoted as and .

The system equations are:

(5-55)

(5-56)

(5-57)

These equations are combined in order to obtain the representation of (5-20). For this, the

state vectors and are merged into the new state vector . Again, the

equivalence holds, and we obtain the system matrices in (5-58).

ζ t( ) η t( )

G1 z( ) G2 z( ) n1 n2

v t( ) w t( )

uv

y

G1 z( ) P

w

G2 z( )

Figure 5-5. Wiener-Hammerstein system.

x1 t 1+( ) A1x1 t( ) B1u t( )+=

v t( ) C1x1 t( ) D1u t( )+=⎩⎨⎧

w t( ) pivi

t( )i 1=

r

∑=

x2 t 1+( ) A2x2 t( ) B2w t( )+=

y t( ) C2x2 t( ) D2w t( )+=⎩⎨⎧

x1 t( ) x2 t( ) x t( )


106

(5-58)

The vectors of monomials are defined as

, (5-59)

where

. (5-60)

5.4.4 Nonlinear Feedback

In this section, we will discuss a simple (I) and a more general (II) type of Nonlinear

Feedback system. The first system is shown in Figure 5-6, and is referred to as NLFB I.

It is described by the following equations:

(5-61)

A A1 0n1 n2×

p1B2C1 A2

= BB1

p1D1B2

= C p1D2C1 C2= D p1D

1D2=

E0

n1 1×0

n1 1×… 0

n1 1×0

n1 1×

p2B2C12

1( ) 2p2B2C1

1( )C1 2( ) … rprB2C1

n1( )D1r 1–

prB2D1r

=

F D2 p2C12

1( ) 2p2C1

1( )C1 2( ) … rprC1

n1( )D1r 1–

prD1r=

ζ t( ) η t( ) ξ' t( ) r{ }= =

ξ' t( ) x1 t( ) … xn1t( ) u t( )

T=

uv

y

G0 z( )

PFigure 5-6. Nonlinear Feedback I.

++

-

x0 t 1+( ) A0x0 t( ) B0v t( )+=

y t( ) C0x0 t( ) D0v t( )+=⎩⎨⎧


107

(5-62)

After substitution of (5-62) in (5-61), we obtain:

(5-63)

The last equation of (5-63) is a nonlinear algebraic equation due to the presence of the direct

term coefficient . This coefficient renders the system incompatible with the PNLSS model.

For the more general Nonlinear Feedback system (see Figure 5-7), similar nonlinear algebraic

equations pop up due to the direct term of the linear subsystems.

In order to continue the analysis, the following assumptions are made:

Assumption 5.5 (Delay in system NLFB I) A delay is present in the linear dynamic

block , i.e., .

Assumption 5.6 (Delay in system NLFB II) A delay is present in at least one of the

linear blocks , , or . This is equivalent to , , or

.

Assumption 5.5 and Assumption 5.6 are true, when for instance a digital controller is present

somewhere in the feedback loop. If this is not the case, we will still assume that a zero direct

v t( ) u t( ) piyi

t( )i 1=

r

∑–=

x0 t 1+( ) A0x0 t( ) B0 u t( ) piyi

t( )i 1=

r

∑–⎝ ⎠⎛ ⎞+=

y t( ) C0x0 t( ) D0 u t( ) piyi

t( )i 1=

r

∑–⎝ ⎠⎛ ⎞+=

⎩⎪⎪⎨⎪⎪⎧

D0

u y

G1 z( )

P

+

G2 z( )G3 z( )Figure 5-7. Nonlinear Feedback II.

+

-

G0 z( ) D0 0=

G1 z( ) G2 z( ) G3 z( ) D1 0= D2 0=

D3 0=


108

term is present in one of the linear blocks. Note that a delay is always present in real-life

systems. If the sampling frequency is sufficiently high, the delay will be in the same order of

magnitude as one delay tab. When this condition is not fulfilled, the data can be upsampled in

order to achieve a negligible direct term [12]. The equations in (5-63) are therefore reduced

to:

(5-64)

These system equations are equivalent to (5-20) using the following system parameters:

(5-65)

and the following vectors of monomials:

. (5-66)

To prove the equivalence for the system NLFB II, different polynomial vector maps are

necessary. They depend on the position of the delay in the feedback loop (i.e., the linear

system for which is assumed to be zero). In (5-67), we define three vectors composed of

the state vectors of the linear systems , , and .

(5-67)

The necessary monomials and are listed in Table 5-2.

0 0

Table 5-2. Required monomials as a function of the position of the delay.

x0 t 1+( ) A0x0 t( ) B0 u t( ) pi C0x0 t( )( )ii 1=

r

∑–⎝ ⎠⎛ ⎞+=

y t( ) C0x0 t( )=⎩⎪⎨⎪⎧

A A0 p1B0C0–= B B0= C C0= D 0= F 0=

E B– 0 p2C02

1( ) 2p2C0

1( )C0 2( ) … rprC0

na 1–( )C0r 1–

na( ) prC0r

na( )=

ζ t( ) x t( ) r{ }= η t( ) 0=

Di

G1 z( ) G2 z( ) G3 z( )

ξ1 t( ) x1 t( ) x2 t( )T

= ξ2 t( ) x2 t( )= ξ3 t( ) x1 t( ) x2 t( ) x3 t( ) u t( )T

=

ζ t( ) η t( )

D1 0= D2 0= D3 0=

ζ t( ) ξ1 t( )r{ } ξ2 t( )

r{ } ξ3 t( )r{ }

η t( ) ξ2 t( )r{ }


109

5.4.5 Conclusion

We have established a link between a number of standard block-oriented models and the

PNLSS model. The results for the different nonlinear block structures are summarized in Table

5-3 and Table 5-4. In each row, the required monomials for the PNLSS approach are

presented.

It is beyond discussion that block-oriented models give the most physical insight to the user.

From an identification point of view, they often require less parameters than the PNLSS

approach. The open loop block-oriented models have the advantage that the intermediary

signals are solved in a non recurrent way during the estimation and the simulation. Therefore,

their stability is simple to check and to ensure. On the other hand, block-oriented models

require prior knowledge about the structure of the device, which is not always easy to obtain.

For some block-oriented models, like the Wiener-Hammerstein system or the Nonlinear

Feedback Structure, initial values are not always straightforward to obtain.

The PNLSS model is inherently compatible with MIMO systems, and it does not need any prior

knowledge. The price paid for this flexibility is the explosion of the number of required model

parameters. The pros and cons of both approaches are summarized in Table 5-5. To conclude,

Hammerstein(5-49)

Wiener(5-53)

Wiener-Hammerstein(5-58)

0

Table 5-3. PNLSS monomials for open loop block-oriented models.

NLFB I(5-65)

NLFB II, (5-67)

NLFB II, (5-67)

NLFB II, (5-67)

0 0

Table 5-4. PNLSS monomials for feedback block-oriented models.

ζ t( ) u t( ) r{ } ξ' t( ) r{ }

η t( ) u t( ) r{ } ξ t( ) r{ } ξ' t( ) r{ }

D1 0= D2 0= D3 0=

ζ t( ) x t( ) r{ } ξ1 t( )r{ } ξ2 t( )

r{ } ξ3 t( )r{ }

η t( ) x t( ) r{ } ξ2 t( )r{ }


110

none of both the approaches is clearly better than the other. For this reason, the user should

choose an appropriate model structure according to his/her needs.

Block-Oriented State SpacePhysical interpretation �

Number of parameters �

Flexibility of the model �

Model initialization �

Table 5-5. Comparison of the block-oriented approach vs. the state space approach.

A Step beyond the Volterra Framework

111

5.5 A Step beyond the Volterra FrameworkBy means of two simple examples, we illustrate that systems which fit into the polynomial

nonlinear state space model structure, do not necessarily belong to the Volterra framework

which was set up in the introductory chapter.

5.5.1 Duffing Oscillator

The Duffing oscillator is a second order nonlinear dynamic system which is excited with a

harmonic signal. Its behaviour is described by the following differential equation:

, (5-68)

where , , and are the system parameters. The amplitude of the sinusoidal signal is

determined by the parameter . According to the value of this parameter, several kinds of

output behaviour can occur, such as ordinary harmonic output, period doubling, period

quadrupling, and even chaotic behaviour. The Duffing equation can also be written in a state

space form:

(5-69)

where . Next, this continuous-time model is converted into a discrete-time

model using the Euler rule with a time step :

(5-70)

such that

, (5-71)

and

d2yc

dt2----------- a

dyc

dt-------- byc cyc

3+ + + dcos ωt( )=

a b c

d

dX1 t( )dt

---------------- X2 t( )=

dX2 t( )dt

---------------- duc t( ) aX2 t( )– bX1 t( )– cX13 t( )–=

⎩⎪⎪⎨⎪⎪⎧

uc t( ) cos ωt( )=

h

dX t( )dt

------------- f X t( ) uc t( ),( ) = x t 1+( )⇒ x t( ) hf x t( ) u t( ),( )+=

X th( ) x t( )≅


112

. (5-72)

We apply this principle to (5-69), and obtain

(5-73)

The discrete-time model in (5-73) might be a poor approximation of its continuous-time

counterpart due the simplicity of the differentiation rule. However, in what follows the focus

will lie solely on the behaviour of the discrete-time model, and not on the relation between (5-

69) and (5-73). From (5-73), it is clear that this system belongs to the PNLSS model class. In

the next simulation, these equations are simulated during iterations using the

following settings:

(5-74)

u t( ) uc th( )=

x1 t 1+( ) x1 t( ) hx2 t( )+=

x2 t 1+( ) hdu t( ) 1 ha–( )x2 t( ) hbx1 t( )– hcx13 t( )–+=⎩

⎨⎧

Nsim 106

=

0 20 40 60 80

0.5

0

0.5

x2(t

)

Time [s]

0 5 10 15 20 25 30300

200

100

0

100

Spect

rum

x2 [

dB]

Angular Frequency [rad/s]

Figure 5-8. Top plot: state trajectory of the discretized Duffing oscillator;bottom plot: DFT of the state trajectory (grey) and the input signal (black).

a 1= b 10–= c 100= d 0.82= ω 3.5= h2πωN--------=

A Step beyond the Volterra Framework

113

with . The value of the time step is chosen such that no leakage is present in the

calculated DFTs. In Figure 5-8, the state trajectory of during the first 100 seconds is

shown (top plot). In the bottom plot, the DFT of the last samples of the state trajectory is

plotted (grey), together with the DFT of the input signal (black). From this figure, we observe

that besides harmonic components, also subharmonic components are present: the harmonic

lines corresponding to , , and are also excited. Although the DUT can be

represented by the PNLSS model, it does not fit into the Volterra-framework.

5.5.2 Lorenz Attractor

In [40], E. N. Lorenz studied the nonlinear differential equations that describe the behaviour

of a forced, dissipative hydrodynamic flow. The solutions of these equations appeared to be

extremely sensitive to minor changes of the initial conditions. This is the so-called butterfly

effect. Often, the behaviour of this system is referred to as chaotic, while it is described by the

following deterministic model equations:

(5-75)

where the model parameters are given by , , and . Like in the

previous section, we convert these equations into a discrete-time description by applying the

Euler differentiation method with time step :

(5-76)

with . An input term is added to the first state equation, such that initial

conditions can be imposed on the system. It can easily be seen that these equations fit into

the proposed PNLSS model structure. As with the Duffing oscillator, we are not interested in a

N 4096= h

x2 t( )

8N

ω 4⁄ ω 2⁄ 3ω 4⁄

dX1 t( )dt

---------------- σ X2 t( ) X1 t( )–( )

=

dX2 t( )dt

---------------- X1 t( ) ρ X3 t( )–( ) X2 t( )–=

dX3 t( )dt

---------------- X1 t( )X2 t( ) bX3 t( )–=⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧

ρ 28= σ 10= b 8 3⁄=

h

x1 t 1+( ) hσ x2 t( ) x1 t( )–( ) x1 t( ) hu t( )+ +=

x2 t 1+( ) h x1 t( ) ρ x3 t( )–( ) x2 t( )–( ) x2 t( )+=

x3 t 1+( ) h x1 t( )x2 bx3 t( )–( ) x3 t( )+=⎩⎪⎨⎪⎧

Xi th( ) xi t( )≅ hu t( )


114

perfect match between (5-75) and (5-76). In the following simulation, we use a time step

, apply an impulse with amplitude on the state , and simulate the

equations in (5-76) during samples. The resulting state trajectory is shown in

Figures 5-9 and 5-10. The chaotic behaviour of this model is in contradiction to the Volterra

framework, since here the response to a periodic input is definitely not periodic. This example

illustrates that the PNLSS model structure is ‘richer’ than the Volterra framework.

h 0.01= A 0.01= x1 t( )

Nsim 10 000=

0 20 40 60 80 100

20

0

20x1(t

)

0 20 40 60 80 100

20

0

20

x2(t

)

0 20 40 60 80 1000

50

x3(t

)

Time [s]

Figure 5-9. State trajectory of the discretized Lorenz attractor (2D plot).

Figure 5-10. State trajectory of the discretized Lorenz attractor (3D plot).

Identification of the PNLSS Model

115

5.6 Identification of the PNLSS ModelIn this part of the chapter, an identification procedure for the model in (5-20) is proposed. It

consists of four major steps. First, we estimate, in mean square sense, the Best Linear

Approximation (BLA) of the plant. Then, a parametric linear model is estimated from the BLA

using frequency domain subspace techniques. This is immediately followed by a nonlinear

optimization to improve the linear model estimates. The last step consists of a nonlinear

optimization procedure in order to obtain the parameters of the full nonlinear model.


For calculating the Best Linear Approximation of the Device Under Test (DUT), we refer to

Chapter 2 (see “Estimating the Best Linear Approximation” on p. 28). The procedure

explained there converts the measured input/output data into a nonparametric linear model

, and its sample covariance denoted by . is given in the form of a

Frequency Response Function (FRF). This data reduction step offers a number of advantages.

First of all, the Signal to Noise Ratio (SNR) is enhanced. Secondly, it allows the user to select,

in a straightforward way, a frequency band of interest. Finally, when periodic data are

available, the measurement noise and the effect of the nonlinear behaviour can be separated.

5.6.2 Frequency Domain Subspace Identification

The next step is to transform the nonparametric estimate into a parametric model. The

purpose is to estimate a linear, discrete-time state space model from , taking into

account the covariance matrix . For this, we make use of the frequency domain

subspace algorithm in [44] which allows to incorporate covariance information for non

uniformly spaced frequency domain data. Furthermore, we rely on the results presented in

[53], where the stochastic properties of this algorithm were analysed for the case in which the

sample covariance matrix is employed instead of the true covariance matrix. In this section,

we briefly recapitulate the algorithm and the model equations on which the algorithm is

based.

A. Model Equations

First, consider the DFT in samples of the state space equations (5-9):

G k( ) CG k( ) G k( )

G k( )

G k( )

CG k( )

N


116

(5-77)

for , and with . In the transient term , is defined as

. (5-78)

In what follows, we will neglect the transient term. The procedures that determine the BLA

result in an estimate in the form of a FRF. Hence, we rewrite (5-77) into the FRF form as well.

This is done by setting . The plant model then looks as follows:

(5-79)

with the state matrix , , and the order of the model.

We multiply the second equation of (5-79) by , and elaborate it by repeatedly substituting

with the first equation of (5-79).

(5-80)

After substitutions, we obtain

. (5-81)

We write down equation (5-81) for with :

zkX k( ) AX k( ) BU k( ) zkxI+ +=

Y k( ) CX k( ) DU k( )+=⎩⎨⎧

k 1 … F, ,= zk ejωkTs= zkxI xI

xI1

N-------- x 0( ) x N( )–( )=

U k( ) Inu=

zkX k( ) AX k( ) B+=

G k( ) CX k( ) D+=⎩⎨⎧

X k( ) �na nu×

∈ G k( ) �ny nu×

∈ na

zkp

zkX k( )

zkpG k( ) zk

p 1–CzkX k( ) zkD+( )=

zkp 1–

CAX k( ) CB+ zkD+( )=

zkp 2–

CA2X k( ) CAB zkCB+ + zk

2D+( )=

…

p 1–

zkpG k( ) CA

pX k( ) CA

p 1–B zkCA

p 2–B … zk

p 1–CB zk

pD+ + + +( )+=

p 0 … r 1–, ,= r na>


117

(5-82)

The extended observability matrix and the matrix that contains the Markov

parameters, are defined as:

(5-83)

We also define

. (5-84)

By applying the definitions (5-83) and (5-84) to the equations of (5-82), we obtain the

following relation:

, (5-85)

where the matrices , , and are defined as

(5-86)

The complex data equation in (5-85) is now converted into a real equation:

, (5-87)

where we define .

G k( ) CX k( ) D+=

zkG k( ) CAX k( ) CB zkD+ +=

…

zkr 1–

G k( ) CAr 1–

X k( ) CAr 2–

B … zkr 2–

CB zkr 1–

D+ + + +=

Or Sr

Or

C

CA

…

CAr 1–

= Sr

D 0 … 0 0

CB D … 0 0

… … … … …

CAr 2–

B CAr 3–

B … CB D

=

Wr k( ) 1 zk … zkr 1–

T=

r

G OrX SrI+=

G X I

G Wr 1( ) G 1( )⊗ … Wr F( ) G F( )⊗=

X X 1( ) … X F( )=

I Wr 1( ) Inu⊗ … Wr F( ) Inu

⊗=

Gre

OrXre

SrI+=

Zre

Re Z( ) Im Z( )=


118

Assumption 5.7 We assume the following additive noise setting:

, (5-88)

where the noise matrix has independent (over ), circular complex normally

distributed elements with zero mean

, (5-89)

and covariance :

. (5-90)

Equation (5-87) then becomes

, (5-91)

with

. (5-92)

Another assumption concerns the controllability and the observability of the true plant model:

Assumption 5.8 The true plant model can be written in the form (5-79) where

is observable and is controllable.

At first sight, it is awkward to refer to a ‘true’ linear model, whereas the algorithm will be used

to identify a model for a nonlinear DUT. The actual system will definitely not adhere to the

linear representation in (5-79). However, one should bear in mind that, at this stage, the goal

is to retrieve a parametric model for the Best Linear Approximation of the system. From the

view point of the BLA, the nonlinear behaviour of the DUT only results in two kind of effects:

bias contributions which change the dynamic behaviour of the BLA, and stochastic

contributions which act like ordinary disturbing noise (see also Chapter 2).

The state space matrices can be retrieved from equation (5-91) using the frequency domain

subspace identification algorithm summarized in paragraph B.

G k( ) G0 k( ) NG k( )+=

NG k( ) k

NG k( ){ }� 0=

CG k( )

CG k( ) vec NG k( )( )vecH NG k( )( ){ }�=

Gre

OrXre

SrI NGre

+ +=

NG Wr 1( ) NG 1( )⊗ … Wr F( ) NG F( )⊗=

A C,( )

A B,( )


119

B. Frequency Domain Subspace Identification Algorithm [44]

1. Estimate the extended observability matrix , given and .

1a. Initialization: Choose and form

and , (5-93)

where denotes the -th diagonal partition of (see Appendix 5.B).

1b. Eliminate the input from using a QR-decomposition of : .

(5-94)

Define as the left upper block of elements. Then, has

dimensions , and ( ) remains after the elimination of

from .

1c. Remove the noise influence from (5-91): calculate the SVD of :

, (5-95)

and estimate as

. (5-96)

2. Make use of the shift property of to estimate and from :

and (5-97)

3. Estimate and , given and : minimize with respect to and :

, (5-98)

with

, (5-99)

. (5-100)

Or G k( ) CG k( )

r na>

Z Ire

Gre= CN Re Wr k( )Wr

H k( ) CGi

k( )i 1=

nu

∑⊗k 1=

F

∑⎝ ⎠⎛ ⎞=

CGi

k( ) i CG k( )

Ire

Z ZT Z RTQT=

Z Ire

Gre

R11T 0

R12T R22

T

Q1T

Q2T

= =

R11T rnu rnu× R12

T

rny rnu× R22T rny rny× I

re

Z

CN1 2⁄–

R22T

CN1 2⁄–

R22T UΣVT=

Or

Or CN1 2⁄

U : 1:na[ , ]=

Or A C Or

A Or†

1: r 1–( )ny :[ , ]Or ny 1:rny+ :[ , ]= C Or 1:ny :[ , ]=

B D A C VSS B D

VSS εH k( )CG1–

k( )ε k( )k 1=

F

∑=

ε k( ) vec GSS A B C D k, , , ,( ) G k( )–( )=

GSS A B C D k, , , ,( ) C zkInaA–( ) 1–

B D+=


120

5.6.3 Nonlinear Optimization of the Linear Model

The weighted least squares cost function defined in (5-98) is a measure of the model

quality. According to this measure, it turns out that the subspace algorithm generates

acceptable model estimates. However, in practical applications strongly depends on the

dimension parameter chosen in step 1.a of the identification procedure. A first action that

can be taken to improve the model estimates, is to apply the subspace algorithm for different

values of , for instance , and to select the model that corresponds to the

lowest .

The second way to obtain better modelling results is to consider the cost function

, (5-101)

with

, (5-102)

and to minimize with respect to all the parameters ( ). This is a nonlinear

problem that can be solved using the Levenberg-Marquardt algorithm (see “The Levenberg-

Marquardt Algorithm” on p. 135). This method requires the computation of the Jacobian of

the model error with respect to the model parameters. From (5-102) and (5-100), we

calculate the following expressions:

(5-103)

VSS

VSS

r

r r na 1 … 6na, ,+=

VSS

VWLS εH k( )CG1–

k( )ε k( )k 1=

F

∑=

ε k( ) vec GSS A B C D k, , , ,( ) G k( )–( )=

VWLS ABCD

ε k( )

ε k( )∂Aij∂

------------- vec C zkInaA–( ) 1–

Iij

na na×zkIna

A–( ) 1–B⎝ ⎠

⎛ ⎞=

ε k( )∂Bij∂

------------- vec C zkInaA–( ) 1– Iij

na nu×⎝ ⎠⎛ ⎞=

ε k( )∂Cij∂

------------- vec Iij

ny na×zkIna

A–( ) 1–⎝ ⎠⎛ ⎞=

ε k( )∂Dij∂

------------- vec Iij

ny nu×⎝ ⎠⎛ ⎞=

⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧


121

The subspace method is used to generate a number of initial linear models (e.g.

), which are used as starting values for the nonlinear optimization

procedure. Finally, the model that corresponds to the lowest cost function is selected.

Due to the fact that a high number of different initial models is employed, there is a higher

probability to end up in a global minimum of , or at least in a good local minimum. Note

that in the parameter space, there exists an infinite number of global minimizers for ,

more precisely a subspace of dimension . This is a consequence of fully parametrizing the

linear state space representation. Furthermore, while carrying out the nonlinear optimization,

the unstable models estimated with the subspace algorithm are stabilized, for instance using

the methods described in [14].

To exemplify this method, we apply it to the semi-active damper data set (see “Description of

the Experiments” on p. 161), and estimate models of order for different values of :

. In Figure 5-11, the cost function of the subspace estimates is shown

(grey dots), together with the cost function of the optimized models (black dots).

Figure 5-11 illustrates that is a craggy function of , and that for this particular data set,

the Levenberg-Marquardt algorithm ends up in the same local minimum: the same value of

is attained for a large number of initial models.

r na 1 … 6na, ,+=

VWLS

VWLS

VWLS

na2

na 3= r

r 3 … 75, ,= VSS

VWLS

VSS r

VWLS

10 20 30 40 50 60 7010

0

101

102

103

Cost

Funct

ion

r

Figure 5-11. WLS cost function of the subspace estimates (grey dots),and the cost function after the nonlinear optimization (black dots)

for different values of dimensioning parameter .

VSSVWLS

r


122

5.6.4 Estimation of the Full Nonlinear Model

The last step in the identification process is to estimate the full nonlinear model

(5-104)

with the initial state given by , and where is the output noise. For this, a

weighted least squares approach will be employed. In order to keep the estimates of the

model parameters unbiased, the following assumption is required.

Assumption 5.9 It is assumed that the input of the model in (5-104) is noiseless,

i.e., it is observed without any errors and independent of the output noise.

In practical situations, it may occur that Assumption 5.9 is not fulfilled. When the SNR at the

input is sufficiently high (> 40 dB), the resulting bias in the estimated model parameters is

negligible. When the SNR is too low, it can be increased by employing periodic signals: by

measuring a sufficient number of periods, and averaging over time or frequency, the SNR is

improved in a straightforward way.

The weighted least squares cost function with respect to the model parameters

will be minimized:

, (5-105)

where is a user-chosen, frequency domain weighting matrix. Typically, this

matrix is chosen equal to the inverse covariance matrix of the output . This matrix can

be obtained straightforwardly when periodic signals are used to excite the DUT. By choosing

properly, it is also possible to put more weight in a certain frequency band of interest.

When no covariance information is available and no specific weighting is required by the user,

a constant weighting ( , for ) is employed. Furthermore, the model

error is defined as

, (5-106)

x t 1+( ) Ax t( ) Bu t( ) Eζ t( )+ +=

y t( ) Cx t( ) Du t( ) Fη t( ) e t( )+ + +=⎩⎨⎧

x 1( ) x0= e t( )

u t( )

VWLS

θ vec A( ) vec B( ) vec C( ) vec D( ) vec E( ) vec F( );;;;;[ ]=

VWLS θ( ) εH k θ,( )W k( )ε k θ,( )k 1=

F

∑=

W k( ) �ny ny×

∈

CY1–

k( )

W k( )

W k( ) 1= k 1 … F, ,=

ε k θ,( ) �ny∈

ε k θ,( ) Ym k θ,( ) Y k( )–=


123

where and are the DFT of the modelled and the measured output,

respectively. Note that when is calculated with correct initial conditions, equation

(5-106) does not pose serious leakage problems in the case of non periodic data, because the

leakage terms present in and cancel each other.

A. Calculation of the Jacobian

We minimize by means of the Levenberg-Marquardt algorithm (see “The

Levenberg-Marquardt Algorithm” on p. 135). This requires the computation of the Jacobian

of the modelled output with respect to the model parameters. Hence, we need to

compute

. (5-107)

Given the nonlinear relationship in (5-104), it is impractical to calculate the model output and

the Jacobian directly in the frequency domain. Therefore, we will perform these operations in

the time domain, followed by a DFT in order to obtain and .

Before deriving explicit expressions, we recapitulate some general aspects with respect to the

calculation of the Jacobian, which are pointed out in [47] and [78].

Consider a general discrete-time nonlinear model

(5-108)

where and are the model parameters present in the state and output equation,

respectively. The derivatives of the output with respect to and are given by

(5-109)

Ym k θ,( ) Y k( )

Ym k θ,( )

Ym k θ,( ) Y k( )

VWLS θ( )

J k θ,( )

J k θ,( ) ε k θ,( )∂θ∂

-------------------Ym k θ,( )∂

θ∂------------------------= =

Ym k θ,( ) J k θ,( )

x t 1+( ) f x t( ) u t( ) a, ,( )=

y t( ) g x t( ) u t( ) b, ,( )=⎩⎨⎧

a b

y t( ) a b

x t 1+( )∂a∂

--------------------- f x t( ) u t( ) a, ,( )∂x t( )∂

-------------------------------------- x t( )∂a∂

------------ f x t( ) u t( ) a, ,( )∂a∂

--------------------------------------+=

y t( )∂a∂

------------ g x t( ) u t( ) b, ,( )∂x t( )∂

--------------------------------------- x t( )∂a∂

------------=

y t( )∂b∂

------------ g x t( ) u t( ) b, ,( )∂b∂

---------------------------------------=⎩⎪⎪⎪⎨⎪⎪⎪⎧


124

These equations can be rewritten as follows

(5-110)

Hence, the expressions that define the calculation of the Jacobian (5-110) can be regarded as

a new dynamic discrete-time nonlinear model. The inputs of this Jacobian model are the

inputs and the simulated states of the original model. These states are obtained by simulating

the original model with the estimated parameters of the previous Levenberg-Marquardt

iteration. For model equations (5-104), explicit expressions for the Jacobian can be found in

Appendix 5.D. Furthermore, due to the polynomial nature of (5-104), the equations in (5-110)

are in a polynomial form as well. Hence, a PNLSS model can be determined that calculates the

elements of the Jacobian. In Appendix 5.E, explicit expressions are derived for the state space

matrices of this new model.

B. Initial Conditions

In (5-110), the simulated states are employed to calculate the Jacobian. Hence, when the

state sequence is computed, the initial state of the model in (5-104) should be taken into

account. For this, three possible approaches are distinguished. The simplest, but rather

inefficient way, is to calculate the Jacobian for the full data set, and then to discard the first

transient samples of both the Jacobian and the model error. In this way, a part of the

data is not used for the model estimation.

The second method can only be employed when periodic excitations are applied during the

experiments. As mentioned earlier, the simulated states from the previous Levenberg-

Marquardt iteration are used to calculate the Jacobian. By applying several periods of the

input sequence, and by considering only the last simulated state period, the transients

become negligible. This principle is depicted in Figure 5-12 for two input periods. In this

particular example, it suffices to discard the first period to obtain states that are in regime. In

order to save computing time, a fraction of a period can be used as a preamble. This can be

done for highly damped systems, or when the number of samples per period is high.

xa t 1+( ) fx x t( ) u t( ) a, ,( )xa t( ) fa x t( ) u t( ) a, ,( )+=

ya t( ) gx x t( ) u t( ) b, ,( )xa t( )=

yb t( ) gb

x t( ) u t( ) b, ,( )=⎩⎪⎪⎨⎪⎪⎧

x0

Ntrans


125

The last method, which is suitable for both periodic and non periodic excitations, is to

estimate the initial conditions as if they were ordinary model parameters. This can be

achieved in a straightforward way, since the estimation of is equivalent to estimating an

extra column in the state space matrix . The idea is to add an artificial model input to

the model, which only contributes to the state equation in a linear way (i.e., only via the

matrix). The resulting input is then given by

. (5-111)

Assume that the original input, the state and the output data sequences are defined for time

indices . We consider and , and apply an impulse signal to

the artificial input of the system:

(5-112)

Then, we obtain the following state equation for :

(5-113)

Consequently, the initial conditions can be estimated like ordinary model parameters by

adding an artificial input to the model.

nonlinear

model

input states

Figure 5-12. Removal of the transients in the simulated states;state(s) in regime (black), transient (red).

x0

x0

B uart

B

u' t( ) u t( )uart t( )

=

t 1 … N, ,= u 0( ) 0= x 0( ) 0=

uart t( )1 t 0=

0 t 1 … N, ,=⎩⎨⎧

=

t 0=

x 1( ) Ax 0( ) Bu' 0( ) Eζ 0( )+ +=

Bu' 0( )=

B : nu 1+,[ ]=

uart


126

C. Starting Values

The last obstacle that needs to be cleared before starting the nonlinear optimization is to

choose good starting values for . For the matrices , , , and , we will use the

estimates obtained from the parametric Best Linear Approximation. The other state space

matrices ( and ) are initially set to zero. The idea of using a linear model as a starting

point for the nonlinear modelling is certainly not new (e.g. [71]), and is quite often employed.

Using the parametric BLA as the initial nonlinear model offers two important advantages. First

of all, it guarantees that the estimated nonlinear model performs at least as good as the best

linear model. Secondly, for the model structure in (5-104), this principle results in a rough

estimate of the model order .

D. How to handle the similarity transform?

As mentioned earlier, the state space representation is not unique: the similarity transform

leaves the input/output behaviour unaffected. The elements of the

transformation matrix can be chosen freely, under the condition that is non singular. The

parameter space has thus at least unnecessary dimensions. This poses a problem for the

gradient-based identification of the model parameters : the Jacobian does not have

full rank and, hence, an infinite number of equivalent solutions exists.

One way to deal with this problem is to use a canonical parametrization, such that the

redundancy disappears. However, it is known that this may lead to numerically ill-conditioned

estimation problems [45].

A second way to cope with the overparametrization is to employ so-called Data Driven Local

Coordinates (DDLC) [45], or a Projected Gradient search [80]. The key idea of these methods

is to identify the manifold of models parametrized by in the parameter space, for which the

models have an identical input/output behaviour. Thus, any parameter update for which the

model remains on this manifold does not change the input/output behaviour. Therefore, the

methods presented in [45] and [80] compute the parameter update such that it is locally

orthogonal to the manifold: this is achieved by computing a projection matrix

such that the new Jacobian

(5-114)

θ A B C D

E F

na

xT t( ) T 1– x t( )= na2

T T

na2

θ �nθ∈

θ

P �nθ nθ na

2–×∈

JDDLC θ( ) J θ( )P=


127

has columns less than the original Jacobian , and has full rank. The matrix needs

to be determined during every iteration step.

The third method to deal with the rank deficiency of the Jacobian consists in using a full

parametrization, and employing a truncated Singular Value Decomposition (for more details,

see “The Levenberg-Marquardt Algorithm” on p. 135). In [86], it is shown that this method

and the DDLC method are equivalent: the search direction in the -space computed with

DDLC and the one obtained by means of a truncated SVD are identical. The additional

advantage of the DDLC method is the calculation of less columns of the Jacobian matrix

compared to the full parametrization. This can save a considerable amount of computation

time, especially when the model order is high. The DDLC approach is feasible when the

computation of is straightforward. This is the case for linear, bilinear, and LPV state space

models. However, for the polynomial nonlinear state space model, the calculation of is very

involved. Hence, we will employ the third method: a full parametrization and a truncated SVD.

E. Overfitting and Validation

The nonlinear search should be pursued until the cost function in (5-105) stops decreasing.

However, as it is often the case for model structures with many parameters, overfitting can

occur during the nonlinear optimization [70]. This phenomenon can be visualized by applying

a fresh data set to the models obtained from the iterations of the nonlinear search. In the

case of overfitting, the model quality first increases up to an optimum, and then deteriorates

as a function of the number of iterations. The reason for this is the following: at the start of

the optimization, the important parameters are quickly pulled to minimizing values, and

diminish the bias error. As the minimization continues, the less important parameters are

more and more drawn to minimizing values. Hence, a growing number of parameters

becomes activated, and the variance on the parameter estimates increases. In order to avoid

this effect, we use the so-called stopped search [70]: we evaluate the model quality of every

estimated model on a test set, and then select the model that achieves the best result. This

method is a form of implicit regularization, because it prevents the activation of unnecessary

parameters.

na2 J θ( ) P

θ

na2

P

P


128

F. Stability during Estimation and Validation

We will assume that the parametric linear model obtained from the BLA is stable. As

mentioned before, this can be ensured using stabilizing methods for linear models, like for

instance [14]. Hence, the nonlinear optimization of the full PNLSS model is started from a

stable model. The first phase in the nonlinear optimization consists of calculating the

Jacobian. Hence, the first question is whether this calculation remains stable. Consider the

recursive expressions for the Jacobian given in (5-168). From these equations, it is observed

that the time-varying dynamics of the Jacobian are determined by the factor

. (5-115)

On the other hand, the Jacobian matrix of the original state equation in (5-104), with respect

to the states, is given by

. (5-116)

This expression describes the linearised dynamic behaviour of the original model at every time

instance. Since (5-115) and (5-116) are identical, the original model and the Jacobian model

share the same dynamic behaviour (i.e., the instantaneous poles of both models are

identical). Consequently, a stable model always yields a stable Jacobian.

The second question is whether a parameter update during the nonlinear optimization yields a

stable model. Naturally, this is not necessarily the case. When unstability occurs, it will be

reflected by the value of the cost function (Inf or NaN). This phenomenon can easily be

handled by the nonlinear optimization procedure, as if it was an ordinary increase of the cost

function.

On experimental data, it occurs from time to time that the estimated model becomes unstable

on the validation set. To overcome this problem, the following heuristic approach is employed.

The validation input signal is also passed on to the nonlinear optimization algorithm. In this

way, the validation output of the updated model with parameters (see Figure 5-14 in

“The Levenberg-Marquardt Algorithm” on p. 135) can be computed in every iteration. When

the validation output is unstable, the optimization algorithm reacts as if the cost function has

increased. This approach guarantees a model which is stable for the validation set.

A Eζ' t( )+( )

x t 1+( )∂x t( )∂

--------------------- A Eζ' t( )+=

θtest


129

Nevertheless, this procedure prevents the iterative search to go through an unstable

(validation) zone before ending up in a stable zone again. Consequently, this method should

only be applied when it is strictly necessary.


130

Appendix 5.A Some CombinatorialsIn this appendix, we calculate the number of distinct monomials in variables of a given

degree . Choosing different elements out of a set of elements can be done in a number

of different ways, which is given by the binomial coefficient:

(5-117)

For instance, if we have to choose two different elements from , this results in

(5-118)

combinations, namely

(5-119)

We would also like to add identical combinations, like for instance. To do so, we need

to add one dummy variable to the set . Then, the resulting 10 combinations

are:

(5-120)

In general, we need to add dummy variables in order to obtain

(5-121)

monomials of degree in variables.

n

r r n

nr⎝ ⎠⎛ ⎞ n!

r! n r–( )!----------------------=

1 2 3 4, , ,{ }

42⎝ ⎠⎛ ⎞ 4!

2! 4 2–( )!------------------------ 6= =

1 2,{ } 1 3,{ } 1 4,{ }, ,2 3,{ } 2 4,{ },

3 4,{ }

1 1,{ }

s 1 2 3 4, , ,{ }

1 2,{ } 1 3,{ } 1 4,{ } 1 s,{ }, , , s 1=

2 3,{ } 2 4,{ } 2 s,{ }, , s 2=

3 4,{ } 3 s,{ }, s 3=

4 s,{ } s 4=

r 1–

n r 1–+r⎝ ⎠

⎛ ⎞ n r 1–+( )!r! n 1–( )!

---------------------------=

r n


131

Appendix 5.B Construction of the Subspace Weighting Matrix from the FRF CovarianceThe subspace identification algorithm requires the computation of the weighting matrix ,

which is defined as

. (5-122)

In Chapter 2, we have determined the covariance matrix as:

. (5-123)

The purpose of this appendix is to find an expression for as a function of the elements of

.

When we substitute (5-92) in (5-122), we obtain:

(5-124)

The following identities hold:

(5-125)

i.e., the Hermitian transpose of a Kronecker product and the Mixed Product rule. Applying

these Kronecker product properties to (5-124) results in:

(5-126)

We denote the -th column of as and obtain

. (5-127)

On the other hand, we also have that

CN

CN Re NG k( )NGH k( ){ }�( )=

CG k( )

CG k( ) vec NG k( )( )vecH NG k( )( ){ }�=

CN

CG k( )

CN Re Wr k( ) NG k( )⊗[ ] Wr k( ) NG k( )⊗[ ]Hk 1=

F

∑⎩ ⎭⎨ ⎬⎧ ⎫

�⎝ ⎠⎜ ⎟⎛ ⎞

=

A B⊗( )H AH

BH⊗=

A B⊗( ) C D⊗( ) AC BD⊗=

CN Re Wr k( )WrH k( ) NG k( )NG

H k( ){ }�⊗k 1=

F

∑⎝ ⎠⎛ ⎞=

i NG k( ) N : i,[ ] k( )

NG k( )NGH k( ) N : i,[ ] k( )N : i,[ ]

H k( )i 1=

nu

∑=


132

. (5-128)

Hence, the partition at position in with dimensions (see Figure 5-13) is

given by

. (5-129)

Taking into account equation (5-127), it is clear that can be computed from

the elements of . First, should be divided into partitions,

in which each partition contains elements. Next, the diagonal partitions should be

summed in order to determine . Finally, we obtain

, (5-130)

where denotes the -th partition on the diagonal of .

N vec NG k( )( )vecH NG k( )( )N : 1,[ ] k( )

…N : nu,[ ] k( )

N : 1,[ ]H k( ) … N : nu,[ ]

H k( )= =

Nij i j,( ) N ny ny×

Nij N : i,[ ] k( )N : j,[ ]H k( )=

Niji

j

Figure 5-13. divided into partitions.N ny ny×

NG k( )NGH k( ){ }�

CG k( ) N k( ){ }�= CG k( ) nu nu×

ny ny×

NG k( )NGH k( ){ }�

CN Re Wr k( )WrH k( ) CG

i k( )i 1=

nu

∑⊗k 1=

F

∑⎝ ⎠⎛ ⎞=

CGi k( ) i ny ny× CG k( )


133

Appendix 5.C Nonlinear Optimization MethodsConsider the cost function which is a function of the parameter vector and

the measurements . In this appendix, we will summarize a number of standard iterative

nonlinear optimization methods that can be used to minimize with respect to .

Given their iterative nature, these methods all have in common the computation of a

parameter update .

A. The Gradient Descent Algorithm

The gradient descent algorithm is the most intuitive method to find the minimum of a

function. In this iterative procedure, the parameter update is proportional to the negative

gradient of the cost function

, (5-131)

with the damping factor. The main advantages of the gradient method are its conceptual

simplicity and its large region of convergence to a (local) minimum. The most important

drawback is its slow convergence.

B. The Gauss-Newton Algorithm

When a quadratic cost function needs to be minimized:

, (5-132)

with a residual, the Gauss-Newton algorithm is well suited. The reason for this

is that this iterative procedure makes explicit use of the quadratic nature of the cost function.

This results in a faster convergence compared to the gradient method [25]. The parameter

update of this method is given by

. (5-133)

This approach requires the knowledge of the Hessian matrix (i.e., the matrix containing

the second derivatives) and the gradient of the cost function, both with respect to .

V θ Z,( ) θ �nθ∈

Z

V θ Z,( ) θ

∆θ

∆θ

∇V

∆θ λ∇– V=

λ

V θ Z,( )

V θ Z,( ) eT θ Z,( )e θ Z,( ) ek θ Z,( ) 2

k 1=

N

∑= =

e θ Z,( ) �N∈

∆θ

∆θ ∇2V

⎝ ⎠⎛ ⎞

1–∇V–=

∇2V

∇V θ


134

Further on, it will become clear that, for a quadratic cost function, the Hessian can be

approximated by making use of only the first order derivatives of . The Hessian matrix

and the gradient are given by

, (5-134)

, (5-135)

where is defined as the Jacobian matrix of with respect to :

. (5-136)

When the residuals are small, the second term in (5-134) is negligible compared to

the first term. Hence, the Hessian can be approximated by

, (5-137)

which is only a function of the Jacobian. Hence, the Gauss-Newton parameter update is

found by solving

. (5-138)

The step can be computed in a numerically stable way via the Singular Value

Decomposition (SVD) [27] of :

. (5-139)

The parameter update is then given by

. (5-140)

If is not of full rank, then is singular, and a truncated SVD should be used in

order to compute (5-140). This occurs for example when an overparametrized model is

utilized.

e θ Z,( )

∇2V ∂2V θ Z,( )

θ∂ 2------------------------ 2JT θ Z,( )J θ Z,( ) 2 ek θ Z,( )

∂2ek θ Z,( )

θ∂ 2-------------------------

k 1=

N

∑+= =

∇V∂V θ Z,( )

θ∂--------------------- 2JT θ Z,( )e θ Z,( )= =

J θ Z,( ) e θ Z,( ) θ

J θ Z,( ) e θ Z,( )∂θ∂

--------------------=

ek θ Z,( )

∂2V θ Z,( )θ∂ 2

------------------------ 2JT θ Z,( )J θ Z,( )≅

∆θ

JT θ Z,( )J θ Z,( )∆θ JT θ Z,( )e θ Z,( )–=

∆θ

J θ Z,( )

J θ Z,( ) UΣVT=

∆θ V– Σ 1– Ue θ Z,( )=

J θ Z,( ) Σ 1–


135

The convergence rate of the Gauss-Newton algorithm depends on the assumption that the

residuals are small. If this is the case, then the convergence rate is quadratic,

otherwise it can become supralinear. The main drawback of the Gauss-Newton algorithm is its

smaller region of convergence compared to the gradient method.

C. The Levenberg-Marquardt Algorithm

The Levenberg-Marquardt algorithm [36],[42] combines the large convergence region of the

gradient descent method with the fast convergence of the Gauss-Newton method. In order to

increase the numerical stability and to avoid comparing apples with oranges, the columns of

the Jacobian matrix need to be normalized prior to the computation of the parameter

update. The normalized Jacobian matrix is given by

, (5-141)

where the diagonal normalization matrix is defined as

. (5-142)

In most cases, the normalization yields in a better condition number (i.e., the ratio between

the largest and the smallest non zero singular value) for compared with .

Next, the parameter update is computed by solving the equation

, (5-143)

where the damping factor determines the weight between the two methods. If has a

large numerical value, then the second term in (5-143) is important, and hence the gradient

descent method dominates. When is small, the Gauss-Newton method takes over.

In order to compute (5-143) in a numerically stable way, the SVD of is calculated

first. When the Jacobian is singular, has rank and the SVD is given by

. (5-144)

ek θ Z,( )

J θ Z,( )

JN θ Z,( )

JN θ Z,( ) J θ Z,( )N=

N �nθ nθ×

∈

N diag1

rms J : 1,[ ] θ Z,( )( )------------------------------------------ … 1

rms J : nθ,[ ] θ Z,( )( )--------------------------------------------, ,

⎝ ⎠⎜ ⎟⎛ ⎞

=

JN θ Z,( ) J θ Z,( )

∆θN

JNT θ Z,( )JN θ Z,( ) λ2Inθ

+( )∆θN JNT θ Z,( )e θ Z,( )–=

λ λ

λ

JN θ Z,( )

JN θ Z,( ) nθ nθ<

JN θ Z,( ) Udiag σ1 σ2 … σnθ0 …

0, , , , , ,⎝ ⎠⎛ ⎞VT=


136

Next, the parameter update is calculated using a truncated SVD. This results in

, (5-145)

where the matrix is defined as

. (5-146)

In the last step, the parameter update needs to be denormalized again:

. (5-147)

As a starting value for , the largest singular value of from the first iteration can be

used [25]. Next, is adjusted according to the success of the parameter update. When the

cost function decreases, the approximation made in (5-137) works well. Hence, should be

decreased such that the Gauss-Newton influence becomes more important. Conversely, when

the cost function increases, the gradient descent method should gain more weight: this is

obtained by increasing .

Different stop criteria can be employed to bring the iterative Levenberg-Marquardt algorithm

to an end. For instance, the optimization can be broken off when the relative decrease of the

cost function becomes smaller than a user-chosen value, or when the relative update of the

parameter vector becomes too small. However, the most simple approach is to stop the

optimization when a sufficiently high number of iterations is exceeded. A full

optimization scheme that makes use of this stop criterion is shown in Figure 5-14. In practice,

we will also evaluate the cost function on the validation set, and choose the model which

performs best on this data set (see “Overfitting and Validation” on p. 127).

∆θN

∆θN VΛUTe θ Z,( )–=

Λ

Λ diagσ1

σ12 λ2+

-------------------σ2

σ22 λ2+

------------------- …σnθ

σnθ2 λ2+

--------------------- 0 … 0, , , , , ,

⎝ ⎠⎜ ⎟⎛ ⎞

=

∆θN

∆θ N∆θN=

λ JN θ Z,( )

λ

λ

λ

imax


137

Initialize θ

Compute

Vtest V<

Compute V θ Z,( )

Vtest θtest Z,( )

θtest θ ∆θ+=

λ λ 12---×=

noyes

Figure 5-14. Levenberg-Marquardt algorithm.

i 1=

λ 1–=

i imax<

yes

noStop

λ 1–=yes

no

λ S 1 1,( )=

i i 1+=

i imax>or

λ λ 10×=

V Vtest=

θ θtest=

Compute J

U S V, ,[ ] svd JN( )=

Normalize J

∆θNCompute

Denormalize ∆θN


138

D. Dealing with Complex Data

Suppose the following cost function needs to be minimized:

, (5-148)

with a complex residual. can be rewritten as

, (5-149)

where is defined as

. (5-150)

Furthermore, the matrix is defined as

. (5-151)

The matrices and are thus real matrices which allows us to recycle the

ideas described in section C.

E. Weighted Least Squares

In general, a Weighted Least Squares (WLS) cost function is defined as

, (5-152)

where is a Hermitian, positive definite weighting matrix. Any Hermitian positive

(semi-)definite matrix can be decomposed as [27]:

, (5-153)

where the square root matrix is also Hermitian. Using the SVD of ,

can be calculated straightforwardly:

V θ Z,( )

V θ Z,( ) εH θ Z,( )ε θ Z,( ) εk θ Z,( ) 2

k 1=

N

∑= =

ε θ Z,( ) �N∈ V θ Z,( )

V θ Z,( ) εreT θ Z,( )εre θ Z,( )=

εre θ Z,( )

εre θ Z,( ) Re ε θ Z,( )( )Im ε θ Z,( )( )

=

Jre θ Z,( )

Jre θ Z,( ) Re J θ Z,( )( )Im J θ Z,( )( )

=

εre θ Z,( ) Jre θ Z,( )

VWLS θ Z,( ) εH θ Z,( )Wε θ Z,( )=

W �N N×∈

W W1 2/ W1 2/=

W1 2/ W UΣVT= W1 2/


139

. (5-154)

For real matrices, a similar result holds:

. (5-155)

Equation (5-152) can thus be rewritten as

, (5-156)

or

, (5-157)

with

. (5-158)

The Jacobian of is then given by

. (5-159)

In this way, we recast the WLS problem such that it can be solved using the techniques from

section C and D.

W1 2/ VΣ1 2/ VH W1 2/( )H= =

W1 2/ VΣ1 2/ VT W1 2/( )T= =

VWLS θ Z,( ) εH θ Z,( ) W 1 2/

⎝ ⎠⎛ ⎞H

W1 2/ ε θ Z,( ) W 1 2/ ε θ Z,( )⎝ ⎠

⎛ ⎞HW1 2/ ε θ Z,( )= =

VWLS θ Z,( ) εH θ Z,( )ε θ Z,( )=

ε θ Z,( ) W1 2/ ε θ Z,( )=

ε θ Z,( )

J θ Z,( ) ε θ Z,( )∂θ∂

-------------------- W1 2/ ε θ Z,( )( )∂θ∂

-------------------------------------- W1 2/ J θ Z,( )= = =


140

Appendix 5.D Explicit Expressions for the PNLSS Jacobian

In this appendix, we compute explicit expressions for the derivatives of the model output (5-

104) with respect to the parameters . We first define the matrices and

as

(5-160)

denotes a zero matrix with a single element equal to one at entry :

(5-161)

We begin by computing the Jacobian with respect to the elements of the matrix . The

derivative of the output equation with respect to is given by:

(5-162)

In order to determine the right hand side of (5-162), we also need the derivatives of the state

equation which are given by

. (5-163)

We define as

θ ζ' t( ) �nζ na×

∈

η' t( ) �nη na×

∈

ζ' t( ) ζ t( )∂x t( )∂

------------ζ t( )∂x1 t( )∂

--------------- … ζ t( )∂xna

t( )∂-----------------= =

η' t( ) η t( )∂x t( )∂

-------------η t( )∂x1 t( )∂

--------------- … η t( )∂xna

t( )∂-----------------= =

Iijm n×

�m n×∈ i j,( )

Iijm n×

0 … 0 … 0

… … … … …0 … 1 … 0

… … … … …0 … 0 … 0

i

j

=

Aij A

Aij

y t( )∂Aij∂

------------Aij∂∂

Cx t( ) Du t( ) Fη t( )+ +( )=

Cx t( )∂Aij∂

------------ Fη' t( ) x t( )∂Aij∂

------------+=

x t 1+( )∂Aij∂

---------------------Aij∂∂

Ax t( ) Bu t( ) Eζ t( )+ +( )=

xAijt( ) �

na∈


141

. (5-164)

Then, equation (5-163) is rewritten as

. (5-165)

Combining equations (5-162) and (5-165) results in

(5-166)

where is defined as

(5-167)

The Jacobian of the other model parameters are computed in a similar way. We summarize

the results below:

(5-168)

xAijt( ) x t( )∂

Aij∂------------=

xAijt 1+( ) Iij

na na×x t( ) A Eζ' t( )+( )xAij

t( )+=

xAijt 1+( ) Iij


t( )+=

JAijt( ) C Fη' t( )+( )xAij

t( )=⎩⎪⎨⎪⎧

JAijt( ) �

ny∈

JAijt( ) y t( )∂

Aij∂------------=

xAijt 1+( ) Iij


t( )+=

JAijt( ) C Fη' t( )+( )xAij

t( )=⎩⎪⎨⎪⎧

xBijt 1+( ) Iij

na nu×u t( ) A Eζ' t( )+( )xBij

t( )+=

JBijt( ) C Fη' t( )+( )xBij

t( )=⎩⎪⎨⎪⎧

xEijt 1+( ) Iij

na nζ×ζ t( ) A Eζ' t( )+( )xEij

t( )+=

JEijt( ) C Fη' t( )+( )xEij

t( )=⎩⎪⎨⎪⎧

JCijt( ) Iij

ny na×x t( )=

JDijt( ) Iij

ny nu×u t( )=

JFijt( ) Iij

ny nη×η t( )=


142

Appendix 5.E Computation of the Jacobian regarded as an alternative PNLSS system

It is clear from (5-168) that the computation of , , and is equivalent to

calculating the output of an alternative PNLSS system. We consider here, for instance, the

calculation of and we will attempt to write equations (5-166) in the following form:

(5-169)

For this, we define the new inputs, states and outputs as follows:

(5-170)

A number of relations between the original and the new system matrices are trivial:

(5-171)

The remaining system matrices and monomials require slightly more effort to determine. The

goal of the following calculations is to rewrite the terms and as

and , respectively. The time indices will be omitted for the sake of simplicity.

Using the multinomial notation and with multi-index defined as the power of the -th

monomial , we have:

. (5-172)

Next, we derive expressions for . The derivative of with respect to the state

variable is equal to . We can neglect the presence of the factor when is

not present in a given monomial, since the corresponding is in that case equal to zero.

Hence, we obtain the following relation:

JAijt( ) JBij

t( ) JEijt( )

JAijt( )

x t 1+( ) Ax t( ) Bu t( ) Eζ t( )+ +=

y t( ) Cx t( ) Fη t( )+=⎩⎨⎧

u t( ) x t( )u t( )

= x t( ) xAijt( )= y t( ) JAij

t( )=

A A= C C= B Iij 0na nu×= D 0=

Eζ' t( )x t( ) Fη' t( )x t( )

Eζ t( ) Fη t( )

αj j

ζj

ζζ1

…ζnζ

uα1

…

uαnζ

= =

ζ'ζ∂x∂

-----= ζj

xi αj i( )ζjxi1– xi

1– xi

αj i( )


143

(5-173)

Then, the product is given by

. (5-174)

We define the new -th monomial as

, (5-175)

which allows to rewrite (5-174) as

. (5-176)

This leads to the definition of the new matrix :

(5-177)

ζ'

α1 1( )ζ1x11– … α1 na( )ζ1xna

1–

… … …αnζ

1( )ζnζx1

1– … αnζna( )ζnζ

xna

1–

=

ζ'x

ζ'x

α1 1( )ζ1 … α1 na( )ζ1

… … …αnζ

1( )ζnζ… αnζ

na( )ζnζ

x1x11–

…xna

xna

1–

=

j ζj˜

ζj˜ ζj

x1x11–

…xna

xna

1–

ζj

x1u11–

…

xnauna

1–

= =

ζ'x

α1 0 … 0

0 α2 … 0

… … … …0 0 … αnζ

ζ1

ζ2

…

ζnζ

=

E

E E

α1 0 … 0

0 α2 … 0

… … … …0 0 … αnζ

=


144

145

CHAPTER 6

APPLICATIONS OF THE

POLYNOMIAL NONLINEAR

STATE SPACE MODEL

In this chapter, the nonlinear state space approach is applied to a

number of real-life systems: the Silverbox, a combine harvester, a

semi-active damper, a quarter car set-up, a robot arm, a Wiener-

Hammerstein system, and a crystal detector. For each case study, the

Device Under Test (DUT) and the performed experiments are

described. Next, the Best Linear Approximation is estimated and a

nonlinear state space model is built. Whenever possible, we compare

our approach with other modelling methods.

Chapter 6: Applications of the Polynomial Nonlinear State Space Model

146

6.1 Silverbox

6.1.1 Description of the DUT

The Silverbox is an electronic circuit that emulates the behaviour of a mass-spring-damper

system (Figure 6-1). The input of the system is the force applied to the mass ; the

output represents the mass displacement. The spring acts nonlinearly and is characterized

by the parameters and . Since is positive, the spring is hardening. This means that

relatively more force is required as the spring is extended. The equation that describes the

system’s behaviour is given by

. (6-1)

The parameter determines the damping which is present in the system. For a sinusoidal

input , (6-1) is also known as the Duffing equation (see “Duffing Oscillator” on p. 111).

6.1.2 Description of the Experiments

The applied excitation signal consists of two parts (Figure 6-2). The first part of the signal is

filtered Gaussian white noise with a linearly increasing RMS value as a function of time. This

sequence consists of 40 700 samples and has a bandwidth of 200 Hz. The average RMS value

of the signal is 22.3 mV. This data set will be used to validate the models. The second part of

the excitation signal contains 10 realizations of an odd random phase multisine with 8192

samples and 500 transient points per realization. The bandwidth of the excitation signal is also

200 Hz and its RMS value is 22.3 mV. This sequence is applied once to the system under test

and will be used to estimate the models. In all experiments, the input and output signals are

measured at a sampling frequency of 10 MHz/214 = 610.35 Hz.

uc m

yc

k1 k3 k3

my··c t( ) dy·c t( ) k1yc t( ) k3yc3 t( )+ + + uc t( )=

d

uc t( )

Figure 6-1. Mass-spring-damper system.

uc

d

k1 k3,

m

Silverbox

147


In order to obtain a nonparametric estimate of the Best Linear Approximation (BLA), the FRF

is determined for every phase realization of the estimation data set. The BLA is

then calculated by averaging those FRFs. Next, a parametric second order linear state space

model is estimated. From this model, initial values will be extracted in order to estimate some

nonlinear models. The results are plotted in Figure 6-3: the top and bottom plot show the

amplitude and phase of the BLA, respectively. The solid black line denotes the BLA; the solid

grey line represents the linear model. The total standard deviation is also given

(black dashed line), together with the model error (dashed grey line), i.e., the difference

between the measured BLA and the linear model.

Unfortunately, only one period per realization was measured. Hence, no distinction can be

made between the nonlinear contributions and the measurement noise (see “Periodic Data”

on p. 28).

From Figure 6-3, we observe that the linear model is of good quality: up to a frequency of

120 Hz, the model error coincides with the standard deviation. A statistically significant, but

small model error is present in the frequency band between 120 Hz and 200 Hz. This error is

surprising, because equation (6-1) corresponds to a linear, second order system when

omitting the nonlinear term . To explain this behaviour, the way the measurements

0 50 100 150 200

0.15

0.1

0.05

0

0.05

0.1

0.15

Input

Sig

nal [V

]

Time [s]

Figure 6-2. Excitation signal that contains a validation and an estimation set.

EstimationValidation

GBLA jωk( )

σBLA k( )

k3yc3 t( )


148

were made should be taken into account. A band-limited (BL) set-up was employed during the

measurements [56]. Hence, a discrete-time, second order model does not suffice to model

this continuous-time, second order system. Although the model error disappears for a third

order model, we will neglect it and continue with the second order model.

The second data set is now used to validate the linear model. The measured output of the

system is plotted in Figure 6-4 (black line) together with the simulation error (grey line). The

RMS value of the simulation error (RMSE) is 13.7 mV. This number should be compared to the

RMS output level that measures 53.4 mV.

0 50 100 150 20060

40

20

0

20Am

plit

ude [

dB]

Frequency [Hz]

Figure 6-3. BLA of the Silverbox (solid black line); Total standard deviation (black dashed line); 2nd order linear model (solid grey line); Model error (dashed grey line).

0 50 100 150 200180

135

90

45

0

Phase

[º]

Frequency [Hz]

Silverbox

149

6.1.4 Nonlinear Model

Now, we will investigate whether better modelling results can be obtained with a nonlinear

model. First, a second order polynomial nonlinear state space (PNLSS) model is estimated

with the following settings:

(6-2)

and

. (6-3)

Hence, the nonlinear degree in the state equation is nx=[2 3]. We include all cross products

of the states and the input. In the output equation, only the linear terms are present (ny=0).

This results in a nonlinear model that contains 37 parameters.

The validation results for this nonlinear model are shown in Figure 6-5. Again, the measured

output signal is denoted by the black line; the simulation error of the nonlinear model is

plotted in grey. The RMS value of the model error has dropped significantly from 13.7 mV for

the linear model to 0.26 mV for the nonlinear model. Hence, the second order polynomial

nonlinear state space model performs more than a factor 50 better than the linear one. The

0 10 20 30 40 50 60

0.2

0.1

0

0.1

0.2

0.3

Am

plit

ude [

V]

Time [s]

RMSE: 13.7 mV

Figure 6-4. Validation result for the 2nd order linear model:measured output (black) and model simulation error (grey).

ξ t( ) x1 t( ) x2 t( ) u t( )T

=

ζ t( ) ξ t( ) 3{ }= η t( ) 0=


150

spectra of the measured validation output signal (black), linear simulation error (light grey),

and nonlinear simulation error (dark grey) are shown in Figure 6-6. The errors of the linear

model are particularly present around the resonance frequency (approximately 60 Hz), i.e.,

for large signal amplitudes. The errors of the nonlinear model are concentrated around the

resonance frequency and close to DC.

Higher model orders and degrees of nonlinearity were also tried out, but none of them gave

better results than this second order nonlinear model. We also estimated some state affine

0 10 20 30 40 50 60

0.2

0.1

0

0.1

0.2

0.3Am

plit

ude [

V]

Time [s]

RMSE: 0.26 mV

Figure 6-5. Validation result for the best nonlinear model:measured output (black) and model simulation error (grey).

0 50 100 150 200 250 300100

80

60

40

20

0

Am

plit

ude [

dBV]

Frequency [Hz]

Figure 6-6. DFT spectra of the measured validation output signal (black), linear simulation error (light grey), and nonlinear simulation error (dark grey).

Silverbox

151

models (see “State Affine Models” on p. 89) of various orders and degrees. In Table 6-1, the

validation results for such models of degree 3 and 4 are summarized.

It is clear that the state affine approach yields unsatisfying results on the Silverbox data set. A

possible reason for the poor performance is that the state affine approach approximates the

nonlinear behaviour of a system by polynomial functions of the input, while the nonlinear

behaviour of the Silverbox mainly consists of a nonlinear feedback of the output. As a result,

high system orders are required in order to obtain a good model [69].

6.1.5 Comparison with Other Approaches

At the Symposium of Nonlinear Control Systems (Nolcos) in 2004, a special session was

organized around the Silverbox device. The aim was to compare different modelling

approaches applied to the same nonlinear device, using the same experimental data. In all

the papers participating to this session, the multisine and the Gaussian noise data set were

used for estimation and validation, respectively. Before continuing, a warning concerning the

validation data is appropriate. As can be seen from Figure 6-2, the amplitude of the last part

of the validation input exceeds the amplitude of the estimation data. Hence, for this part of

the validation data, extrapolation will occur. It is important to emphasize that the performance

achieved in this region is not a good measure for the model quality. It is rather a matter of

“luck”: if there is an exact correspondence between the internal structure of the DUT and the

model structure, this will generally yield in good extrapolation behaviour. But if this is not the

case, and the estimated model is only an approximation, the extrapolation will in general be

Model Order

State Affinedegree 3


ValidationRMSE [mV]

Number of parameters

ValidationRMSE [mV]


n=2 9.12 23 9.12 32n=3 10.51 39 7.51 55n=4 9.40 59 10.04 84n=5 6.22 83 9.47 119n=6 12.65 111 13.15 160n=7 11.81 143 12.77 207

Table 6-1. Validation results for state affine models of degree 3 and 4.


152

poor. Therefore, the extrapolation behaviour should be discarded in a fair assessment. Note

that the danger of extrapolation also resides in the use of too small amplitudes (for instance

dead zones, which become relatively more important for smaller inputs).

Among the papers, we distinguish three methodologies. The first one is a white box approach

(H. Hjalmarsson [29], J. Paduart [52]), making explicit use of the knowledge about the

internal structure of the Silverbox device. M. Espinoza [22], L. Sragner [76], and V. Verdult

[81] employ a black box approach. Finally, the paper of L. Ljung [39] shows results for black

box and grey box modelling. In the following, we will briefly describe each methodology.

In [29], the internal block structure of the Silverbox is reordered to turn it into a MISO

Hammerstein system. An existing relaxation method for Hammerstein-Wiener systems

(published by E. Bai [1]) is extended to MISO systems, and applied to the Silverbox device.

The model obtained with this method achieves a validation RMSE of 0.96 mV.

Another white box approach is presented in [52]. In Chapter 4, the ideas from this paper are

elaborated. The RMSE on the validation set is 0.38 mV.

In [22], Least Squares Support Vector Machines (LS-SVM) for nonlinear regression are applied

to model the Silverbox. The idea here is to consider a model where the inputs are mapped to

a high dimensional feature space with a nonlinear mapping . This feature space is converted

to a low dimensional dual space by means of a positive definite kernel . As the final model

is expressed as a function of , there is no need to compute explicitly. Furthermore, the

dual space allows to estimate the model parameters by solving a linear least squares problem

under equality constraints. In [22], polynomial kernels are used on the Silverbox data,

yielding a validation result of 0.32 mV. An even better model is obtained in [23] using a

partially linear model (PL-LS-SVM): 0.27 mV. This approach includes the prior knowledge that

linear regressors are present in the model.

The model used in [81] is a state space model, composed of a weighted sum of two local

second order linear models (LLM). The weights are a function of a scheduling vector, which is

chosen equal to the output of the system for this particular DUT. A typical choice for the

�

K

K �

Silverbox

153

weighting functions are radial basis functions which are also used in [81]. The validation result

obtained here is 1.3 mV.

In [76], different types of Artificial Neural Networks (ANN) are assessed: Multi-Layer

Perceptron (MLP) Networks and Networks, both making use of hyperbolic tangent base

functions. In both cases, the maximal time lag for the input and the output is chosen equal to

5. The MLP Network has one hidden layer and contains 60 neurons. The Network has 20

multiplicative and 20 additive elements. The best model is a special MLP Network with only 10

hidden neurons, which also makes use of linear regressors. This results in a RMSE of 7.8 mV.

A whole arsenal of different black and grey box techniques are applied in [39], such as neural

networks, wavenets, block-oriented models, and physical models. The best result is achieved

by a one hidden layer sigmoidal neural network with 30 neurons, using input and output

regressors with a maximal time lag of 10. Note that a custom, cubic regressor is included,

which improves the results significantly. This leads to a RMSE of 0.30 mV.

In Table 6-2, the validation RMSE and the number of required parameters are summarized for

the different approaches.

From Table 6-2, we conclude that the physical models achieve reasonable validation RMSE

values. The main advantage of this approach is the small number of parameters, and the

ability to give a physical interpretation to the identified parameters. In general, the black and

deep-grey box models show the lowest RMSE values. The price paid for their excellent

Author Approach ValidationRMSE [mV]

Number ofparameters

J. Paduart PNLSS 0.26 37H. Hjalmarsson, [29] Physical block-oriented model 0.96 5

J. Paduart, [52] Physical block-oriented model 0.38 10M. Espinoza, [22] LS-SVM with NARX 0.32 490M. Espinoza, [23] PL-LS-SVM with PL-NARX 0.27 190L. Sragner, [76] MLP-ANN 7.8 600V. Verdult, [81] Local Linear State Space model 1.3 16L. Ljung, [39] NL ARX model 0.30 712

Table 6-2. Validation results for various modelling approaches.

ΣΠ

ΣΠ


154

performance is the higher number of parameters they require. An exception to the rule in

Table 6-2 are the MLP neural networks. Several reasons can cause their poor performance for

this particular set-up. First of all, the hyperbolic tangent functions used in the ANN approach

do not exploit the polynomial behaviour of the Silverbox. Secondly, it could be that the neural

network was not properly initialized, or that the nonlinear search used in the estimation

procedure got stuck in a local minimum. We obtained a good result with the polynomial

nonlinear state space model: a low RMSE value (0.26 mV), and a reasonable number of

parameters (37). The reason why our approach works so well is due to the correspondence

between the PNLSS model and the internal structure of the Silverbox, which basically consists

of a cubic feedback of the output (see “Nonlinear Feedback” on p. 106). This match is a clear

advantage in a validation test that requires extrapolation. To conclude, three black box models

clearly stand out in this comparison, with RMSE values close to the noise level of 0.25 mV.

This level was obtained from another data set, in similar experimental conditions.

Combine Harvester

155

6.2 Combine Harvester


The system that we will model in this section is a New Holland CR-960 combine harvester (see

Figure 6-7). Note that there is no grain header mounted at the front side of the harvester. We

will use measurements performed by ir. Tom Coen from the KULeuven, Faculty of Bioscience

Engineering, Department MeBioS. A block scheme of the traction system of the machine is

shown in Figure 6-8. The black connections denote mechanical transmissions; the grey

connection is part of the hydrostatic transmission. The diesel engine delivers the traction

power and is coupled to a hydrostatic pump which on its turn drives a hydrostatic engine. The

speed of the diesel engine is kept at the requested set point by a regulator which varies the

fuel injection. The flow of the hydrostatic pump is controlled by an electric current. The power

is then transferred to the front axle through the mechanical gearbox and the front differential.

The traction system to be modelled has two inputs and one output. The first input is the

steering current of the hydrostatic pump; the second input is the speed setting of the diesel

engine. The engine speed is limited between 1300 and 2100 rotations per minute (rpm); the

steering current of the hydrostatic pump can be varied between 0 % and 100 %. The output

of this MISO system is the measured driving speed, expressed in km/h. A detailed analysis of

Figure 6-7. Combine harvester.


156

the expected system order of the traction system is presented in [9]. The dynamic behaviour

is mainly located in the pump, and consists of three second order subsystems. A part of these

dynamics can be neglected as they are relatively fast; hence the required model order turned

out to be four.


All experiments were performed on the road, with the gearbox fixed in the second gear. Two

sets of orthogonal random odd, random phase multisines were generated (see “Periodic Data”

on p. 34). Hence, a total of 4 realizations were applied to both input channels. Each

realization consisted of two periods of 4096 samples each, and 192 transient samples. The

RMS value of the multisines for the first and second input were 57% and 1715 rpm,

respectively. The bandwidth of the excitation signals was 2 Hz and the sampling frequency

used in the experiments was 20 Hz. The first two realizations will be used to estimate the

models and the remaining two to validate them. Due to timing problems with the PXI

instrumentation system used to perform the experiments, the applied input signals were not

completely periodic. As a consequence, we cannot exploit the periodic nature of the original

signals to separate the measurement noise from the nonlinear contributions. Hence, we treat

the data sequences as if they were non periodic (see “Non Periodic Data” on p. 38).


To estimate the Multiple Input, Multiple Output (MISO) BLA and its covariance, we split the

estimation data ( samples) in subrecords of 524 samples.

DieselEngine

MechanicalGearbox

FrontAxle

SteeringCurrent

SpeedSetting

MachineSpeed

Figure 6-8. Traction system of the combine harvester.

HydrostaticEngine

HydrostaticPump

[rpm] [%] [km/h]

fs

2 2 4096× 192+( )× M 32=

Combine Harvester

157

Then, we compute the auto- and cross spectra with equation (2-24). Finally, the BLA is

obtained with equation (2-41), and its covariance with equation (2-43). Next, a 4th, 5th, and

6th order linear model is estimated with a subspace method (see “Frequency Domain

Subspace Identification” on p. 115) from the BLA and its covariance matrix. The subspace

method is then followed by a nonlinear optimization. Figure 6-9 shows the BLA (solid black

line) and the total standard deviation (dashed black line), together with the 6th order linear

model (solid grey line) and the amplitude of the complex model error (dashed grey line). G11

is the transfer function from the steering current (input 1) to the measured speed (output 1);

G12 is the transfer function from the diesel engine’s speed setting (input 2) to the measured

speed (output 1). Next, a validation test is carried out with the 6th order linear model. In

0 0.5 1 1.5 270

60

50

40

30

20

10

Frequency [Hz]

G11

Am

plit

ude [

dB]

0 0.5 1 1.5 290

80

70

60

50

40

Frequency [Hz]

G12

0 0.5 1 1.5 2

225

180

135

90

45

0

Frequency [Hz]

G11

Phase

[°]

0 0.5 1 1.5 2

225

180

135

90

45

0

Frequency [Hz]

G12

Figure 6-9. The MISO BLA (G11 and G12) of the combine harvester (solid black line); Total standard deviation (dashed black line); 6th order linear model (solid grey line);

Model error (dashed grey line).


158

Figure 6-10, the model error for the validation data is shown (grey) together with the

measured output (black). The validation data consists of two merged multisine realizations.

Therefore, two transient phenomena are present in the model error: one at the start of the

data set and one around 400 s. For the calculation of the RMSE (0.73 km/h), we discard 200

samples at the start of each realization in order to eliminate the effect of the transients. When

taking a closer look at the simulation error, we observe periodic residuals. This effect can be

caused by periodic disturbances (e.g. coupling with 50 Hz mains), or by unmodelled dynamics

(since a quasi-periodic excitation signal was employed). In the next section, it will become

clear that there are unmodelled dynamics.


For the nonlinear modelling, we use as starting values the 4th, 5th, and 6th order linear model

obtained in the previous step. Two types of nonlinear state space models are considered:

polynomial nonlinear models and state affine models. For the polynomial models, we have

observed that a nonlinear output equation does not enhance the modelling results. Therefore,

we only show the results for models with a linear output equation ( ). We also have

noticed that the nonlinear combinations with the inputs in the state equation do not improve

the results. Hence, these terms are omitted in what follows. The validation RMSE of the

estimated polynomial nonlinear models is shown in Table 6-3 (left: , and right:

). From this table, it is clear that the RMSE decreases for higher model orders,

0 100 200 300 400 500 600 700 800

2

0

2

4

6

8

10

12Speed [

km

/h]

Time [s]

RMSE: 0.73 km/h

Figure 6-10. Validation result for the 6th order linear model:measured output (black) and model simulation error (grey).

η t( ) 0=

ζ t( ) x t( ) 3( )=

ζ t( ) x t( ) 3{ }=

Combine Harvester

159

while the number of parameters increases significantly. The best result achieved with the

polynomial nonlinear approach is a RMSE of 0.35 km/h using a 6th order model with nx=[3].

The validation error of this model is shown in Figure 6-11 (grey), together with the measured

output (black). The two transients are discarded in the same way as described in the previous

section. Figure 6-12 shows the spectra of the measured validation output (black), the linear

simulation error (light grey), and the nonlinear simulation error (dark grey). From this plot, we

observe that the nonlinear model reduces the linear model error between DC and 1 Hz, but

for higher frequencies no significant improvement is obtained.

Furthermore, state affine models of degree 3 and 4 are also estimated. The validation results

for these models are shown in Table 6-4. A 5th order model of degree 3 yields the best result

(0.39 km/h). No clear trends are visible in Table 6-4; the results are comparable to what we

PNLSSnx=[3], ny=[]

PNLSSnx=[2 3], ny=[]

Model Order

ValidationRMSE [km/h]




n=4 0.63 94 0.63 134n=5 0.57 192 0.54 267n=6 0.35 356 0.36 482

Table 6-3. Validation results for the polynomial nonlinear state space models.

0 100 200 300 400 500 600 700 800

2

0

2

4

6

8

10

12

Speed [

km

/h]

Time [s]

RMSE: 0.35 km/h



160

obtained with the polynomial nonlinear state space models. Hence, both approaches perform

equally well.



Model Order





n=4 0.46 149 0.45 254n=5 0.39 209 0.40 359n=6 0.41 279 0.44 482


0 2 4 6 8 1040

20

0

20

40Am

plit

ude [

dBkm

/h]

Frequency [Hz]


Semi-active Damper

161

6.3 Semi-active Damper


This application concerns the modelling of a magneto-rheological (MR) damper. This damper

is called semi-active, because the characteristics of the viscous fluid inside the damper is

influenced by a magnetic field. Hence, the relation between the force over the damper and

the position/velocity of the piston is changed.

Two quantities serve as input to this system: the reference signal applied to the PID controller

to regulate the piston position via the shaker, and the current which determines the magnetic

field over the viscous fluid. As system output, we consider the force over the damper, which is

measured by a load cell. The measurement set-up is shown in Figure 6-13. For an ideal, linear

damper, we expect to obtain an improper first order model, i.e., the theoretical relationship

between displacement and force for a perfect damper. Due to the non-idealities of the device,

the required model order will turn out to be higher.


Both the construction of the set-up and the measurements were carried out by ir. Kris

Smolders from the PMA Department of the KULeuven. He applied three realizations of a full

grid, random phase multisine to the DUT. The multisines were excited in a frequency band

between 0.12 Hz and 10 Hz, and 6 periods per realization were measured with 65 536

samples per period. In all the measurements, a sampling frequency of 2000 Hz was used.

damper

viscous fluid

load cellshaker

Figure 6-13. Measurement set-up of the magneto-rheological damper.

pistoncurrent

fs


162

A slow DC trend present in the measured output data was removed prior to the estimation

procedures. After removal of the DC levels, the signals applied to the first (piston reference)

and second input (damper current) of the DUT have a RMS value of 39 mV and 194 mV,

respectively. The first two multisine realizations are used for the estimation of the models, and

the third realization for the validation.


First, we will estimate the device’s Best Linear Approximation. Unfortunately, only two

realizations were available. For a dual input system, this is sufficient to calculate the BLA, but

not enough to determine an estimate of the covariance. Hence, the approach described in

0 2 4 6 850

40

30

20

10

0

10

Frequency [Hz]

G11

Am

plit

ude [

dB]

0 2 4 6 850

40

30

20

10

0

10

Frequency [Hz]

G12

0 2 4 6 8

60

90

Frequency [Hz]

G11

Phase

[°]

0 2 4 6 8

270

180

90

0

90

180

270

Frequency [Hz]

G12

Figure 6-14. The MISO BLA (G11 and G12) of the semi-active damper (solid black line); 3rd order linear model (solid grey line); Total standard deviation (dashed black line);

Model error (dashed grey line).

Semi-active Damper

163

“Periodic Data” on p. 34 is not suitable. Therefore, we employ the method described in “Non

Periodic Data” on p. 38 to determine the BLA from the averaged input/output data. We

compute the auto- and cross spectra with equation (2-24), with blocks of 4096

samples. Finally, the BLA is obtained with equation (2-41) and its covariance with equation (2-

43). Then, some linear models with different model orders (2nd to 5th order) are estimated

from the BLA and its covariance matrix, using a subspace method (see “Frequency Domain

Subspace Identification” on p. 115) which is followed by a nonlinear optimization. Figure 6-14

shows the MISO BLA (solid black line) and the total standard deviation (dashed black line),

together with the 3rd order linear model (solid grey line) and the amplitude of the complex

model error (dashed grey line). G11 is the transfer function from the piston reference (input 1)

to the measured force (output 1); G12 is the transfer function from the damper current (input

2) to the measured force (output 1). G11 behaves like expected for a damper: ideally, the

force over the damper should be proportional to the velocity of the piston, i.e., times the

displacement. This is, indeed, roughly what we observe for G11. Furthermore, from the top

plots it can be seen that the relative uncertainty on G12 is high compared with the one on G11.

Hence, the estimated linear model is mainly determined by G11. Next, a validation test is

carried out with the 3rd order linear model. In Figure 6-15, the model error for the validation

data is shown (grey), together with the measured output (black). The RMS value of the model

error (34 mV) is quite high compared with the RMS value of the measured output (71 mV).

We will reduce this error using nonlinear models.

M 32=

jω

0 5 10 15 20 25 30

0.2

0.1

0

0.1

0.2

Am

plit

ude [

V]

Time [s]

RMSE: 33.92 mV

Figure 6-15. Validation result for the 3rd order linear model:measured output (black) and model simulation error (grey).


164


For the nonlinear modelling, we use as starting values the 2nd to 5th order linear models

obtained in the previous step. Again, two types of nonlinear state space models are

considered: PNLSS and state affine models. For the polynomial models, we have observed

that using a nonlinear relation for both the state and the output equation always yields better

modelling results. Hence, no linear state nor output equation is considered in what follows.

For the PNLSS model, we will make a distinction between two choices for the nonlinear

vectors and . First, we take into account all nonlinear combinations using the states

and the inputs (referred to as "full", , with ).

PNLSS, "full"nx=[2 3], ny=[2 3]

PNLSS, "states only"nx=[2 3], ny=[2 3]

Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=2 12.4 98 22.9 29n=3 8.2 211 8.9 75n=4 6.7 399 6.6 164n=5 9.7 689 10.3 317

Table 6-5. Validation results for the PNLSS models with (left) all the nonlinear combinations, and (right) without nonlinear combinations using the input.

0 5 10 15 20 25 30

0.2

0.1

0

0.1

0.2

Am

plit

ude [

V]

Time [s]

RMSE: 6.6 mV


ζ t( ) η t( )

ζ t( ) η t( ) ξ t( ) 3{ }= = ξ t( ) x t( ) u t( );[ ]=

Semi-active Damper

165

Secondly, we consider only the nonlinear combinations of the states, without the inputs

(referred to as "states only", ). The validation RMSE of the estimated

polynomial models is given in Table 6-5. It is clear that the RMSE decreases for higher model

orders up to n=4. We also conclude that taking into account the nonlinear combinations of

the input improves, on average, the RMSE at the price of a significantly higher number of

parameters. For the PNLSS approach, the best result is achieved using a “states only” 4th

order model with degree nx=[2 3], ny=[2 3], resulting in a RMSE of 6.6 mV. This is a

reduction of the model error with a factor 5 compared with the linear model. The validation

error for the best nonlinear model is given in Figure 6-16 (grey), together with the measured

output (black). Figure 6-17 shows the spectra of the measured validation output signal

(black), the linear simulation error (light grey), and the nonlinear simulation error (dark grey).

This plot illustrates that the nonlinear model squeezes down the model error over a broad

frequency range.

Furthermore, different state affine models of degree 3 and 4 are estimated. The validation

results for these models are given in Table 6-6. A 4th order model of degree 4 yields the best

result (13.5 mV).

For this DUT, the PNLSS approach performs clearly better than the state affine approach. By

employing a 4th order PNLSS model, the simulation error on the validation set was reduced

with more than a factor 5 compared with the BLA: from 34 mV to 6.6 mV. This result should

ζ t( ) η t( ) x t( ) 3{ }= =

0 50 100 150 20080

60

40

20

0

Am

plit

ude [

dBV]

Frequency [Hz]



166

be compared with the noise level (1.8 mV), which can easily be determined since several

periods of the measured data are available. Hence, it should be possible to reduce the model

error with an additional factor 3. However, it was not possible to achieve this with the PNLSS

approach. Maybe the nonlinear optimization got stuck in a local minimum, or the model order/

degree should be increased further in order to obtain better results.



Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=2 26.5 59 23.2 98n=3 17.7 99 17.1 167n=4 13.8 149 13.5 254n=5 14.5 209 30.2 359

Table 6-6. Validation results for different state affine models of degree 3 and 4.

Quarter Car Set-up

167

6.4 Quarter Car Set-up


In this test case, we study a quarter car set-up which is situated at the PMA Department of

the KULeuven, and which was built by ir. Kris Smolders [72]. The set-up is a scale model of a

car suspension based on masses, springs, and the magneto-rheological damper that was

modelled in section 6.3.

The system is excited by a hydraulic shaker which emulates, by means of a PID controller, the

vertical road displacement. The reference signal for the position of the shaker serves as

system input. The force over the damper is considered as the system output and is measured

with a load cell, which is placed between the damper and the car mass. Taking into account

the various interactions between the masses, springs and damper, and the shaker dynamics,

the expected model order is about six (for a more elaborate discussion, see [72]).


K. Smolders applied two realizations of a full grid, random phase multisine (RPM), which was

excited in a frequency band between 0.05 Hz and 10 Hz. Per multisine realization, 10 periods

load cell

Magneto-rheological damper

car mass

shaker

wheel mass

Figure 6-18. Quarter car set-up.


168

were measured, with 40 000 samples per period. Furthermore, a filtered Gaussian noise (GN)

sequence with a linearly increasing RMS value over time was applied to the system. This

signal consisted of about 280 000 data samples. The RMS value of the multisine and the noise

sequence are, respectively, 75 mV and 58 mV. Both data sets are shown in Figure 6-19. In the

plot on the right side, the RMS value of the RPM sequence is given (light grey line), together

with the RMS value of the GN data set (dark grey line). The latter is calculated per block of

8 000 samples. From this plot, we observe that the RMS value of the Gaussian noise sequence

exceeds the RMS value of the multisine data set around t=100 s. For larger values of t, we

end up in the extrapolation zone (grey block). In all the measurements, the current applied to

the semi-active damper was fixed to 1 A, and a sampling frequency of 2000 Hz was used.

Prior to the estimation, a slow DC trend that stems from the load cell sensor was removed

from all the measured data, using linear detrending. Originally, the RPM data set was intended

for the estimation, and the GN data set for the validation. However, this leads to poor

modelling results, even when the GN sequence is only used up to t=100 s. A possible

0 10 20 300.4

0.2

0

0.2

0.4(a) RPM data set

Am

plit

ude [

V]

Time [s]0 50 100

0.4

0.2

0

0.2

0.4(b) GN data set

Time [s]

Figure 6-19. (a) Random Phase Multisine data set, and (b) Gaussian Noise data set.RMS RPM (light grey line), RMS GN (dark grey line), and extrapolation zone (grey block).

0 10 20 30 40 50

40

20

0

(a) DFT Spectrum RPM signal

Frequency [Hz]0 10 20 30 40 50

40

20

0

(b) DFT Spectrum GN signal

Frequency [Hz]

Figure 6-20. DFT spectrum of (a) the RPM signal, and (b) the GN data set.

fs

Quarter Car Set-up

169

explanation for this is the fact that the spectrum of the GN is broader than the RPM’s

spectrum (see Figure 6-20). Hence, we decided to interchange the roles of both data sets

such that spectral extrapolation is avoided: the GN data set serves now for the estimation of

the models, and the RPM data set for the validation.


Since the GN data set is employed for the estimation of the models, the approach described in

“Non Periodic Data” on p. 38 is employed to determine the BLA. First, we compute the auto-

and cross spectra with equation (2-24), with blocks of 8 000 samples. Then, the BLA

is obtained with equation (2-41) and its covariance with equation (2-43). From this data, 4th

to 6th order linear models are estimated using a subspace method (see “Frequency Domain

0 2 4 6 8 10 12 14 1630

20

10

0

10

20

Frequency [Hz]

Am

plit

ude [

dB]

0 2 4 6 8 10 12 14 16

90

0

90

Frequency [Hz]

Phase

[°]

Figure 6-21. BLA of the quarter car set-up (solid black line); Total standard deviation (black dashed line); 4th order linear model (solid grey line); Model error (dashed grey line).

M 34=


170

Subspace Identification” on p. 115), which is followed by a nonlinear optimization. Figure 6-

21 shows the BLA (solid black line) and the total standard deviation (dashed black line),

together with the 4th order linear model (solid grey line) and the amplitude of the complex

model error (dashed grey line). Next, the 4th order linear model is validated on the RPM data

set. In Figure 6-22, the simulation error for the validation data is given (grey) together with

the measured output (black). The RMS value of the model error (136 mV) is quite high

compared to the RMS value of the measured output (285 mV). Hence, we will try to reduce

this error with the PNLSS approach.


In what follows, we only show the results for the PNLSS models. State affine models were

also estimated but are omitted here, since they yielded poor results. In Table 6-7, the

validation results are shown for PNLSS models of various orders, with a nonlinear state and

output equation. Two kinds of models are discussed: models that contain all the nonlinear

combinations of the states and the inputs (PNLSS, "full", ), and

models that only employ the nonlinear combinations of the states (PNLSS, “states only”,

). The entries that are indicated with “N.A.” correspond to models

which could not be estimated due to memory restrictions. To this matter, recall that the

estimation data set consists of about 280 000 data samples. The best validation result is

achieved by the 5th order model from the right hand side of the table, giving a simulation

0 5 10 15 20 25 30 351

0.5

0

0.5

1Am

plit

ude [

V]

Time [s]

RMSE: 136 mV


ζ t( ) η t( ) ξ t( ) 3{ }= =

ζ t( ) η t( ) x t( ) 3{ }= =

Quarter Car Set-up

171

error of 44 mV. Since the validation data are periodic and several periods were measured, we

can compare this figure to the noise level at the output, which is 1.8 mV. Apparently, a

significant amount of unmodelled dynamics is still present in the residuals, although the

model error decreased with more than a factor 3 compared with the linear model. Hence, this

DUT is an example where the PNLSS approach delivers unsatisfying results. A higher model

order or an increased nonlinear degree might improve the results, but the size of the data set

prevents the estimation of such models due to memory restrictions. Another possible

explanation for the poor result is that the polynomial approximation is not suited for this set-

up, e.g. due to the presence of hard saturation in the DUT. In Figure 6-24, the spectra of the

measured validation output signal (black), the linear simulation error (light grey), and the

nonlinear simulation error (dark grey) are shown. From this plot, it can be seen that a

significant model error reduction is achieved by the nonlinear model between DC and

approximately 50 Hz.

PNLSS, "full"nx=[2 3], ny=[2 3]

PNLSS, "states only"nx=[2 3], ny=[2 3]

Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=4 104 259 107 159n=5 N.A. N.A. 44 311n=6 N.A. N.A. 50 552

Table 6-7. Validation results for the PNLSS models with all the nonlinear combinations (left), and without nonlinear combinations with the input (right).

0 5 10 15 20 25 30 351

0.5

0

0.5

1

Am

plit

ude [

V]

Time [s]

RMSE: 44 mV



172

0 50 100 150 20060

40

20

0

20Am

plit

ude [

dBV]

Frequency [Hz]


Robot Arm

173

6.5 Robot Arm


In this case study, we will model a robot arm (see Figure 6-25) that was constructed by

ir. Thomas Delwiche and his co-workers from the Control Engineering Department of the

Université Libre de Bruxelles (ULB). The goal of his research is to design a controller for the

robot arm such that it can be used for long-distance surgery. The manipulations carried out by

a surgeon with the master robot arm should be repeated accurately by a slave device, and

should give force feedback to the surgeon. T. Delwiche carried out experiments on the device

in cooperation with the Department ELEC of the Vrije Universiteit Brussel. The robot arm

rotates by means of a DC motor that is driven by a servo-amplifier, which incorporates a

controller. The reference voltage sent to the servo-amplifier serves as input of the system.

The input signal is proportional to the couple applied to the arm (1V = 12.95 10-3 Nm). The

output of the system is the angle of the arm, measured with a 1024 counts per turn encoder

connected to the motor shaft (1V 90º). Furthermore, the speed of the arm is fed back

through a controller in order to introduce damping in the system. This feedback loop is

considered as an intrinsic part of the DUT. When we neglect the dynamics of the DC motor

and the nonlinear effects in the set-up, we expect a second order relationship between the

couple applied to the arm, and the resulting angle.

Figure 6-25. Robot arm.

≈


174


T. Delwiche performed several multisine experiments on the robot arm, using different RMS

input levels and different bandwidths for the excitation signal. All experiments were

performed at a sampling frequency of 10 MHz/214 = 610.35 Hz. From all the available data,

we selected a set of experiments in which the excitation signal has a bandwidth of 30 Hz and

a RMS value of 80 mV. Ten realizations of a random odd, random phase multisine were

applied to the DUT. Each realization consisted of two periods with 24 415 samples per period.

We use eight of these realizations (a total of 195 320 samples) to estimate the models and

the remaining two realizations (48 830 samples) to validate them.

5 10 15 20 25 30100

80

60

40

20

0

Am

plit

ude [

dB]

Frequency [Hz]

5 10 15 20 25 300

45

90

135

Phase

[º]

Frequency [Hz]

Figure 6-26. BLA of the robot arm (solid black line);Total standard deviation (dashed black line); Measurement noise level (dotted black line);

2nd order linear model (solid grey line); Model error (dashed grey line).

Robot Arm

175


First, we calculate the BLA with formula (2-30). Since periodic excitations were used and more

than one period per realization was measured, it is possible to distinguish the nonlinear

contributions and the measurement noise. Figure 6-26 shows the estimated BLA of the robot

arm (solid black line). Furthermore, the total standard deviation due to the

combined effect of measurement noise and nonlinear distortions (dashed black line), and the

standard deviation due to the measurement noise (dotted black line) are also plotted.

We see that the total standard deviation lies significantly higher than the measurement noise,

indicating that the nonlinear behaviour is dominant compared with the measurement noise.

Then, a number of linear models are estimated with subspace techniques, followed by a

nonlinear optimization of the cost function (5-101), using only the excited frequency lines.

The best result is achieved by a 3rd order linear model. This model is also plotted in Figure 6-

26 (solid grey line), together with the model error (dashed grey line). Although these models

seem to fit well the nonparametric Best Linear Approximation estimate, they deliver poor

validation results. Hence, a second nonlinear optimization is applied: this time using all the

frequency lines including DC. After this optimization, the 3rd order linear model is validated.

The result is presented in Figure 6-27 which shows the measured output (black) and the

model error (grey). The RMS error of this model is 34.6 mV. This should be compared to the

RMS level of the output which is 218 mV. We will now try to reduce this model error by

estimating a number of nonlinear state space models.

σBLA k( )

σn k( )

0 10 20 30 40 50 60 70 800.2

0.1

0

0.1

0.2

0.3

0.4

Am

plit

ude [

V]

Time [s]

RMSE: 34.6 mV



176


Again, we start by estimating some polynomial nonlinear state space models, using the linear

models obtained in the previous step as starting values. The results are given in Table 6-8.

The second column shows models with a nonlinear state and output equation

( ); the third column shows models that only have a nonlinear state

equation ( , ). The best result is achieved by the 3rd order model of

the second column (13.5 mV).

Furthermore, state affine models of degree 3 and 4 are estimated. The results are

summarized in Table 6-9. An increasing model order improves the RMSE, apart from some

exceptions where the nonlinear optimization probably got stuck into a local minimum.

PNLSSnx=[2 3], ny=[2 3]

PNLSSnx=[2 3], ny=[]

Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=2 23.2 53 24.1 37n=3 13.5 127 22.2 97n=4 24.3 259 16.3 209

Table 6-8. Validation results for the polynomial nonlinear state space models.



Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=2 21.1 23 21.3 32n=3 23.8 39 23.5 55n=4 17.3 59 16.7 84n=5 7.6 83 18.2 119n=6 5.9 111 8.4 160n=7 12.3 143 15.1 207n=8 9.3 179 18.3 260n=9 5.3 219 6.0 319


ζ t( ) η t( ) ξ t( ) 3{ }= =

ζ t( ) ξ t( ) 3{ }= η t( ) 0=

Robot Arm

177

For the robot arm, the state affine approach clearly yields better results than the PNLSS

approach when it comes to minimizing the RMSE. To see this, we compare the best RMSE

achieved on the validation set: 5.3 mV versus 13.5 mV. The validation test of the best

nonlinear model is shown in Figure 6-28. Compared with the linear model, the model error is

reduced with almost a factor 7: from 34.6 mV to 5.3 mV. Although this is a good result, the

smallest validation error is still large compared with the noise level (0.4 mV). This indicates

that there are still unmodelled dynamics in the residuals. Figure 6-29 shows the DFT spectra

0 10 20 30 40 50 60 70 800.2

0.1

0

0.1

0.2

0.3

0.4Am

plit

ude [

V]

Time [s]

RMSE: 5.3 mV


0 10 20 30 40 5080

60

40

20

0

20

Am

plit

ude [

dBV]

Frequency [Hz]



178

of the measured validation output signal (black), the linear simulation error (light grey), and

the nonlinear simulation error (dark grey). The nonlinear model visibly reduces the model

error over a broad spectral range. The remaining errors are concentrated around DC and stem

from the (low frequency) drift problems observed during the measurements.

Wiener-Hammerstein

179

6.6 Wiener-Hammerstein


In this section, we will model an electronic circuit with a Wiener-Hammerstein structure,

designed by Gerd Vandersteen [82] from the Vrije Universiteit Brussel, Department ELEC. The

system is composed of a static nonlinear block, sandwiched between two linear dynamic

systems.

The first linear system is a 3rd order Chebyshev low-pass filter with a 0.5 dB ripple and a pass

band up to 4.4 kHz. The static nonlinearity is realized by resistors and a diode. The second

linear system is a 3rd order inverse Chebyshev low-pass filter with a -40 dB stop band,

starting at 5 kHz.


The excitation signal consists of two parts: four periods of a random odd, random phase

multisine with 16 384 samples per period, and about 170 000 data points of filtered Gaussian

noise. Both signals have a bandwidth of 10 kHz and a RMS value of about 640 mV. The

multisine will be utilized for the estimation procedure, and the Gaussian noise for validation

purposes. We performed the measurements at a sampling frequency of 51.2 kHz.

6.6.3 Level of Nonlinear Distortions

First, we will analyse the level of nonlinear distortions from the multisine experiment. In

Figure 6-31, the spectrum of the averaged output is shown. The solid black line represents

the output at the excited lines. The grey circles and crosses denote the contributions at the

odd and even detection lines, respectively. In order to improve the visibility of the figure, the

y0u0

f f

Figure 6-30. Wiener-Hammerstein system.

u y


180

number of plotted contributions on the detection lines is reduced. From Figure 6-31, it is clear

that the nonlinear distortions lie in the pass band about 20 dB below the linear contributions.

Furthermore, the even nonlinear distortions slightly dominate the odd nonlinear contributions.

The standard deviation on the excited lines, which is a measure for the measurement noise, is

also plotted (dashed black line). We see that in the pass band, the noise level is about 30 dB

lower than the nonlinear distortion level.


We now calculate using the multisine data set. Since several periods were

measured, the variance due to the measurement noise is estimated using formula (2-

17). To calculate the total variance (i.e., the effect of the nonlinear distortions and

the measurement noise), we cannot use equation (2-21), because only one multisine

realization was applied. However, the level of nonlinear distortions at the non excited

harmonic lines can be interpolated to the excited frequency lines. This allows to calculate the

total variance on the BLA. The BLA is plotted in Figure 6-32 (solid black line), together with

the standard deviation due to the measurement noise (dotted black line), and the total

standard deviation (dashed black line).

0 2 4 6 8 1060

40

20

0

20

40

60

Frequency [kHz]

Am

plit

ude [

dB]

Figure 6-31. Averaged output spectrumExcited lines (solid black line), Standard deviation (dashed black line)

Odd nonlinear distortion (grey circles), Even nonlinear distortion (grey crosses).

GBLA jωk( )

σn2 k( )

σBLA2 k( )

σn k( )

σBLA k( )

Wiener-Hammerstein

181

0 2 4 6 8 10

100

80

60

40

20

0

Am

plit

ude [

dB]

Frequency [kHz]

0 2 4 6 8 10

720

540

360

180

0

Phase

[º]

Frequency [kHz]

Figure 6-32. BLA of the Wiener-Hammerstein circuit (solid black line);Total standard deviation (dashed black line); Measurement noise level (dotted black line);

6th order linear model (solid grey line); Model error (dashed grey line).

0 0.5 1 1.5 2 2.5 31

0.5

0

0.5

Am

plit

ude [

V]

Time [s]

RMSE: 36.2 mV



182

Linear models of various orders are estimated. For this, a subspace technique is used which is

followed by a numeric optimization, both carried out in the frequency domain. The 6th order

linear model yields the best result and is shown in Figure 6-32 (solid grey line), together with

the model error (dashed grey line). Next, the 6th order linear model is validated using the

Gaussian noise sequence. In Figure 6-33, the simulation error (grey) is plotted together with

the measured output (black). The RMSE is quite high compared to the RMS level of the

output: 36.2 mV versus 213 mV. Note also that the asymmetric behaviour of the model error

is in agreement with the dominant even nonlinear behaviour of the system.


The linear models obtained in the previous section are now used as starting values to

estimate a number of polynomial nonlinear state space models. We estimate models of 4th,

5th, and 6th order. Two kinds of models are discussed: models that use all the nonlinear

combinations of the states and the input ( , “full”), and models that use the nonlinear

combinations of the states only ( , “states only”). We also verify whether the use of a

linear or nonlinear output equation influences the results. Table 6-10 shows the modelling

results for the “full” PNLSS models.

In Table 6-11, the validation results are shown for models that do not use the input in the

nonlinear combinations. Since there are less nonlinear terms in these models, the number of

required parameters is significantly lower. However, the RMSE values are always higher than

the corresponding entries in Table 6-10. When taking a closer look at Table 6-10, we observe

that the models with a linear output equation ( , on the right side of the table)

always yield better results than the models with a nonlinear output equation. The best PNLSS

PNLSS, “full”nx=[2 3], ny=[2 3]

PNLSS, “full”nx=[2 3], ny=[]

Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=4 9.60 259 8.33 209n=5 3.70 473 3.61 396n=6 3.32 797 3.21 685

Table 6-10. Validation results for the “full” PNLSS models.

ξ t( ) 3{ }x t( ) 3{ }

η t( ) 0=

Wiener-Hammerstein

183

model is the 6th order model with a linear output equation that uses all the nonlinear

combinations of the states up to degree [2 3]. It has a validation RMSE of 3.21 mV.

Next, we estimate state affine models of various orders and of degree 3 and 4. Looking at

Table 6-12, which shows the validation RMSE for the state affine approach, we observe a

similar trend as with the robot arm data: the RMSE diminishes smoothly as the model order

increases.

The best validation result is achieved by the 10th order model of degree 4, with a RMSE of

2.6 mV. The simulation error of this model is plotted in Figure 6-34 (grey), together with the

measured output signal (black).

PNLSS, “states only”nx=[2 3], ny=[2 3]

PNLSS, “states only”nx=[2 3], ny=[]

Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=4 12.0 159 12.18 129n=5 4.18 311 3.94 261n=6 3.65 552 3.23 475

Table 6-11. Validation results for the “states only” PNLSS.



Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=5 7.35 83 6.98 119n=6 5.10 111 4.64 160n=7 4.41 143 3.82 207n=8 3.86 179 3.13 260n=9 3.66 219 3.01 319n=10 3.56 263 2.60 384



184

Figure 6-35 shows the spectra of the measured validation output signal (black), the linear

simulation error (light grey), and the nonlinear simulation error (dark grey). In the pass-band

of the device, the nonlinear model pushes down the model error with about 20 dB. Beyond

5 kHz, no significant difference between the linear and the nonlinear model error can be

observed.

0 0.5 1 1.5 2 2.5 31

0.5

0

0.5

Am

plit

ude [

V]

Time [s]

RMSE: 2.6 mV


0 5 10 15 20 25100

80

60

40

20

0

Am

plit

ude [

dBV]

Frequency [kHz]


Wiener-Hammerstein

185

6.6.6 Comparison with a Block-oriented Approach

In [68], the same measurements were used to model this electronic circuit using a block-

oriented approach. Both linear blocks of the Wiener-Hammerstein model were identified as a

6th order linear model. The static nonlinearity was parametrized as a 9th degree polynomial.

Taking into account two exchangeable gains between the three blocks, this results in a total of

34 parameters. Furthermore, the RMS value of the simulation error for this model is 3.8 mV.

Hence, we see that this error is reduced with more than 30% to 2.6 mV using the PNLSS/

state affine approach, at the cost of a significantly higher number of parameters.


186

6.7 Crystal Detector


The last modelling challenge discussed in this chapter is an Agilent-HP420C crystal detector

(see Figure 6-36). This kind of device is often used in microwave applications to measure the

envelope of a signal. The RF connection of the crystal detector (the left part in Figure 6-36)

serves as input of the DUT. The video connection of the detector (right part) is considered as

output of the system. From the physical, block-oriented model proposed in [63], we expect to

find a second order relationship between the input and output of this device.


Ir. Liesbeth Gommé from the ELEC Department at the Vrije Universiteit Brussel carried out the

experiments with the crystal detector. She applied two filtered Gaussian noise sequences of

50 000 samples with a growing RMS value as a function of time. Both signals are

superimposed on a DC level of 117 mV, and have a total RMS value of 118 mV. The input

Agilent 423B

Figure 6-36. Agilent-HP crystal detector.

0 1 2 3 4

0.05

0.1

0.15

0.2(a) Estimation data set

Am

plit

ude [

V]

Time [ms]0 1 2 3 4

0.05

0.1

0.15

0.2(b) Validation data set

Time [ms]

Figure 6-37. Estimation and validation input sequences.

Crystal Detector

187

signal of the first data set had a bandwidth of 800 kHz and will be used for estimation

purposes (Figure 6-37 (a)). The second data set had a bandwidth of 400 kHz and will serve as

validation data set (Figure 6-37 (b)). The sampling frequency used in the experiments was

10 MHz. Each sequence was repeated and measured 5 times. This allows to average the data

and to compute the standard deviation of the noise on the measurements (0.23 mV at the

input, and 0.24 mV at the output of the system).


To calculate the BLA, we first average the 5 measured periods of the estimation data set in

order to diminish the measurement noise. We then split the data in subblocks of

5000 samples. With equations (2-25) and (2-27), we calculate the estimated BLA together

fs

0 100 200 300 400 500 600 700 800 90050

40

30

20

10

0

10

Am

plit

ude [

dB]

Frequency [kHz]

0 100 200 300 400 500 600 700 800 900

45

30

15

0

Phase

[º]

Frequency [kHz]

Figure 6-38. BLA of the crystal detector (solid black line);Total standard deviation (dashed black line);

3rd order linear model (solid grey line); Model error (dashed grey line).

M 10=


188

with its covariance . In Figure 6-38, (solid black line) and the total

standard deviation (dashed black line) are plotted. Next, a number of linear models

is estimated using frequency domain subspace identification, followed by a nonlinear

optimization. In Figure 6-38, the third order linear model that was obtained is plotted (solid

grey line), together with the model error (dashed grey line). Next, this linear model is

validated; the result is shown in Figure 6-39. The black line denotes the measured output

signal; the grey line represents the model error. The RMSE of the linear model is 0.89 mV. The

RMS value of the output sequence, without the DC offset, is 15 mV. In the next step, we will

try to reduce the model error using a nonlinear model.


We start by estimating some polynomial nonlinear state space models. We check whether a

linear or a nonlinear output equation needs to be utilized, and whether the input needs to be

included in the nonlinear combinations. Table 6-13 shows the modelling results when all

nonlinear combinations of the states and the input are used (“full” PNLSS, using ).

The left and right column show the results for a nonlinear and linear output equation,

respectively.


σBLA k( )

0 1 2 3 4

0

0.05

0.1

0.15

0.2Am

plit

ude [

V]

Time [ms]

RMSE: 0.89 mV


ξ t( ) 3{ }

Crystal Detector

189

Next, we estimate polynomial nonlinear state space models that only use nonlinear

combinations of the states (“states only” PNLSS, using ). The validation results for

these models are summarized in Table 6-14.

Obviously, the best model structure is the one in the left column of Table 6-13. These models

have a nonlinear state and output equation, and use all nonlinear combinations of the states

and the input. The best PNLSS model is the 4th order model in the left column of Table 6-13,

with a RMSE of 0.260 mV. Taking into account the input and output noise levels on the

averaged data (around 0.23 mV), this is a satisfying result.

Next, we try the state affine approach. The results for state affine models of degree 3 and 4

are shown in Table 6-15. The results here are even better than for the polynomial nonlinear

state space approach: lower validation RMSEs are obtained for a lower number of parameters.

These excellent results seem to contradict the poor performance of the state affine approach

in the Silverbox case study. Both the Silverbox and the crystal detector have been identified as

Nonlinear Feedback systems. Hence, we would expect a similar modelling performance.

However, there is an important difference between both systems: the crystal detector shows a

significantly lower amount of dynamics between its input and output compared to the

PNLSS, “full”nx=[2 3], ny=[2 3]

PNLSS, “full”nx=[2 3], ny=[]

Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=3 0.267 127 0.780 97n=4 0.260 259 0.308 209n=5 0.367 473 0.555 396

Table 6-13. Validation results for the “full” PNLSS models.

PNLSS, “states only”nx=[2 3], ny=[2 3]

PNLSS, “states only”nx=[2 3], ny=[]

Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=3 0.580 71 0.743 55n=4 0.277 159 0.397 129n=5 0.876 311 0.440 261

Table 6-14. Validation results for the “states only” PNLSS models.

x t( ) 3{ }


190

Silverbox (Figure 6-3 versus Figure 6-38). Consequently, there is a high resemblance between

the input and output signals for the crystal detector. Therefore, the fact that the nonlinear

behaviour is mainly present at the output poses less difficulties for the state affine approach

(for which the approximation relies on polynomials of the input).

The best result obtained with the state affine approach is the 4th order model of degree 3

with a validation RMSE of 0.259 mV. This model was used to generate Figure 6-40, which

shows the measured validation output signal (black) and the simulation error of the best

nonlinear model.



Model Order

ValidationRMSE [mV]


ValidationRMSE [mV]


n=3 0.275 39 0.266 55n=4 0.259 59 0.260 84n=5 0.264 83 0.261 119n=6 0.262 111 0.271 160n=7 0.278 143 0.266 207n=8 0.264 179 0.264 260


0 1 2 3 4

0

0.05

0.1

0.15

0.2

Am

plit

ude [

V]

Time [ms]

RMSE: 0.259 mV


Crystal Detector

191

In Figure 6-41, the model errors for the best linear and best nonlinear model are plotted. The

residuals of the linear model mainly consist of a DC offset and large asymmetric spikes at the

end of the sequence, which are not present in the nonlinear model error. Both phenomena

indicate that an even nonlinear behaviour is present in the DUT. This is consistent with the

practical use of the DUT: a squaring characteristic is required for AM-demodulation.

Furthermore, Figure 6-42 shows the DFT of the measured validation output (black), the linear

0 1 2 3 48

6

4

2

0

2

Am

plit

ude [

mV]

Time [ms]

Figure 6-41. Simulation error in the validation test for the best linear (light grey),and best nonlinear (dark grey) model.

0 1 2 3 4 580

60

40

20

Am

plit

ude [

dB]

Frequency [MHz]



192

(light grey) and the nonlinear (dark grey) simulation error. From this plot, we observe that the

nonlinear model reduces the model error in the frequency band between DC and 1 MHz.

6.7.5 Comparison with a Block-oriented Approach

From the analysis of its internal structure, it became clear that the crystal detector can be

represented by a Nonlinear Feedback model. Hence, a block-oriented Nonlinear Feedback

model was estimated for this device in [63], using the same measured data. Contrary to the

Nonlinear Feedback model from Chapter 4, the approach in [63] allows a nonlinearity in the

feedback branch that is dynamic. This nonlinearity takes the form of a Wiener-Hammerstein

structure (see Figure 5-7). In order to separate the feedforward and feedback dynamics

during the estimation, an excitation signal with a linearly increasing amplitude as a function of

time is required. The feedforward branch of the block-oriented model was identified as a 1st

order linear system. The Wiener-Hammerstein structure in the feedback branch consisted of a

1st order linear system, followed by a static nonlinearity that was parametrized as a 9th

degree polynomial. The second linear system was a simple gain factor. Taking into account the

exchangeability of gain factors between the different blocks, the total number of parameters

of the block-oriented model is 13. Furthermore, the simulation error achieved in the validation

test was 0.30 mV. Hence, the PNLSS and the state affine approach perform better, at the price

of a considerably higher number of parameters. Furthermore, the added value of a block-

oriented approach is illustrated in [28]: in this paper, the block-oriented model which is

estimated in the base band is used to predict its behaviour in the RF-band.

193

CHAPTER 7

CONCLUSIONS

Chapter 7: Conclusions

194

The main goal of this thesis was to study and to model the behaviour of nonlinear systems,

using the Best Linear Approximation and multisine excitation signals. These intermediate tools

were successfully applied for the qualification and quantification of DSP non-idealities. This

was demonstrated in Chapter 3 with some practical examples, including finite word length

effects in digital filtering, and the coding/decoding process of an audio compression codec.

The main advantage of the presented method resides in its simplicity and its general

applicability: no elaborated, deep theoretical analysis is required and the method can be

applied regardless of the functionality of the DSP algorithm.

Chapter 4 has revealed yet another example of how the Best Linear Approximation can be

adopted in the identification of block-oriented models. A straightforward identification

procedure was presented to identify a specific class of Nonlinear Feedback models. It was

successfully applied to real measurements from a physical device.

In Chapter 5, we have shown that when a general, black box model of a nonlinear device is

required, the PNLSS model is a perfect tool to achieve this goal. First, we have illustrated the

extreme flexibility of this model by proving an explicit link with several popular block-oriented

models. In addition, some examples of non-Fading Memory systems were shown to have a

PNLSS representation. Secondly, the ability to cope with multivariable systems is included in a

very natural way. If the user is not interested in physical interpretation of the model

parameters, then block-oriented models can be considered obsolete: the PNLSS model

encompasses them all, at the price of a higher number of parameters. Furthermore, the

PNLSS identification procedure is straightforward, and consists of only three simple steps: (1)

compute the Best Linear Approximation, (2) estimate a linear model, and (3) solve a standard

nonlinear optimization problem. The seven successful practical applications from Chapter 6

confirm the flexibility of the PNLSS model. They demonstrate that the identification procedure

works very well in practice, and is robust on both small (e.g. combine harvester) and large

(e.g. quarter car set-up) data sets. In each of the test cases, a significant model error

reduction was achieved compared with the linear models. One of the main advantages of the

PNLSS approach is that no difficult identification settings have to be chosen by the user during

the identification, such as the number of input and output time lags, the number of neurons,

or the hyperparameters values. An estimate of the model order is determined easily from the

BLA in step (2). Although it is possible to improve the model by tweaking the nonlinear state

and output equations, the standard full PNLSS model usually delivers satisfying results with a

moderate nonlinear degree.

195

Naturally, there are limitations to the proposed approach. First of all, the PNLSS model is only

suitable to handle low order systems (as a rule of thumb: ). For higher order systems,

the combinatorial explosion of the number of parameters becomes too restrictive in order to

get good modelling results. This problem can be overcome, for instance by restricting the

number of states included in the polynomial expansion, or by imposing a linear relation for a

part of the states. The second disadvantage resides in the nonlinear search during the

estimation of the model parameters: the risk of getting trapped in a local minimum is always

imminent. However, it must be said that this weak spot is common to many identification

methods, even in the case of linear modelling. Finally, it is difficult to guarantee the stability of

the estimated models. But again, the risk of instability is inherent to any recursive model.

Sometimes it is possible to add constraints in order to keep the model stable, but this may

have a negative influence on the modelling performance. These three limitations constitute

interesting topics for future research.

We conclude that the common denominator in this thesis, is the KISS principle [87]: Keep it

Simple and Stupid, or rather, Successful. The user is not interested in overwhelming and

complicated mathematical models, or sophisticated identification procedures. He/she does not

want to go astray in many identification options that are difficult to understand. Hence, one of

the main contributions of this thesis is the PNLSS approach: a simple but robust tool, that can

be used to model a broad class of nonlinear systems. To put the KISS concept into action, we

recommend the following work flow:

1. Excite the DUT with a broadband excitation signal that corresponds to normal

operating conditions (bandwidth and RMS value). Preferably, apply several realizations

of a random phase multisine, with two periods per realization.

2. Use the input/output measurements to calculate the BLA as explained in Chapter 2.

3. Estimate a linear model from the BLA and its measured covariance. Select the linear

model with the lowest order, for which there are no significant systematic model

errors.

4. Estimate the standard (full) PNLSS model with a nonlinear degree [2 3] in both the

state and the output equation, and the model order determined in the previous step.

Use in this estimation step all but one multisine. The remaining multisine is then used

for the validation.

na 10<

Chapter 7: Conclusions

196

197

REFERENCES

[1] E. W. Bai. An optimal two-stage identification algorithm for Hammerstein-Wiener

nonlinear systems. Automatica, vol. 34, no. 3, pp. 333-338, 1998.

[2] E. W. Bai. Frequency domain identification of Wiener models. Automatica, vol. 39, no. 9,

pp. 1521-1530, 2003.

[3] E. W. Bai. Frequency domain identification of Hammerstein models. IEEE Transactions

on Automatic Control, vol. 48, no. 4, pp. 530-542, 2003.

[4] P. Banelli. Theoretical analysis and performance of OFDM signals in nonlinear fading

channels. IEEE Transactions on Wireless Communications, vol. 2, no. 2, pp. 284-293,

2003.

[5] S. Boyd, L. Chua. Fading Memory and the Problem of Approximating Nonlinear

Operators with Volterra Series. IEEE Transactions on Circuits and Systems, vol. 32,

no. 11, pp. 1150-1161, 1985.

[6] D. K. Campbell. Nonlinear Science from Paradigms to Practicalities. From cardinals to

chaos: reflections on the life and legacy of Stanislaw Ulam. Necia Grant Cooper,

Cambridge University Press, pp. 218, 1989.

[7] S. Chen, S. A. Billings. Representations of non-linear systems: the NARMAX model. Int.

Journal of Control, vol. 49, no. 3, pp. 1013-1032, 1989.

[8] L. Chua, T. Lin. Chaos in Digital Filters. IEEE Transactions on Circuits and Systems,

vol. 35, no. 6, pp. 648-658, 1988.

[9] T. Coen, J. Paduart, J. Anthonis, J. Schoukens, J. De Baerdemaeker. Nonlinear system



[10] R. G. Corlis, R. Luus. Use of residuals in the identification and control of two-input,

single-output systems. I&EC Fundamentals, vol. 8, no. 5, pp. 246-253, 1969.

[11] P. Crama. Identification of block-oriented nonlinear models. PhD Thesis, Vrije

Universiteit Brussel, 2004.

[12] R. Crochiere, L. R. Rabiner. Multirate Digital Signal Processing. Prentice-Hall, Englewood

Cliffs, New Jersey, 1983.

198

[13] R. Dinis, A. Palhau. A class of signal-processing schemes for reducing the envelope

fluctuations of CDMA signals. IEEE Transactions on Communications, vol. 53, no. 5,

pp. 882-889, 2005.

[14] T. D’haene, R. Pintelon, P. Guillaume. Stable Approximations of Unstable Models.

Proceedings of the IEEE Instrumentation and Measurement Technology Conference,

Warsaw, Poland, pp. 1-6, 1-3 May 2007.

[15] T. D’haene, R. Pintelon, J. Schoukens, E. Van Gheem. Variance analysis of frequency

response function measurements using periodic excitations. IEEE Transactions on

Instrumentation and Measurement, vol. 54, no. 4, pp. 1452-1456, 2005.

[16] J. Dieudonné. Foundations of Modern Analysis. Academic Press, New York, 1960.

[17] T. Dobrowiecki, J. Schoukens. Measuring a linear approximation to weakly nonlinear

MIMO systems. Automatica, vol. 43, no. 10, pp. 1737-1751, 2007.

[18] T. Dobrowiecki, J. Schoukens, P. Guillaume. Optimized Excitation Signals for MIMO

Frequency Response Function Measurements. IEEE Transactions on Instrumentation

and Measurement, vol. 55, no. 6, pp. 2072-2097, 2006.

[19] M. Enqvist. Linear Models of Nonlinear Systems. PhD Thesis, Linköping Studies in

Science and Technology, Dissertation no. 985, 2005.

[20] M. Enqvist, J. Schoukens, R. Pintelon. Detection of Unmodeled Nonlinearities Using

Correlation Methods. Proceedings of the IEEE Instrumentation and Measurement

Technology Conference, Warsaw, Poland, pp. 1-6, 1-3 May 2007.

[21] M. Enqvist, L. Ljung. Linear approximations of nonlinear FIR systems for separable input

processes. Automatica, vol. 41, no. 3, pp. 459-473, 2005.

[22] M. Espinoza, K. Pelckmans, L. Hoegaerts, J. Suykens, B. De Moor. A comparative study

of LS-SVMs applied to the silver box identification problem. Proceedings of the IFAC

Symposium on Nonlinear Control Systems, Stuttgart, Germany, pp. 513-518, 1-3

September 2004.

[23] M. Espinoza. Structured Kernel Based Modeling and its Application to Electric Load

Forecasting. Department of Electrical Engineering, KULeuven, 2006.

[24] P. Eykhoff. System Identification. Parameter and State Estimation. Wiley, New York,

1974.

[25] R. Fletcher. Practical Methods of Optimization (Second Edition). Wiley, New York, 1991.

199

[26] M. Fliess, D. Normand-Cyrot. On the approximation of nonlinear systems by some

simple state-space models. Proceedings of the IFAC Identification and Parameter

Estimation Conference, Washington DC, USA, vol. 1, pp. 511-514, 7-11 June 1982.

[27] G. H. Golub, C. F. Van Loan. Matrix Computations - Third Edition. The John Hopkins

University Press, London, 1996.

[28] L. Gommé, J. Schoukens, Y. Rolain, W. Van Moer. Validation of a crystal detector model

for the calibration of the Large Signal Network Analyzer. Proceedings of the

Instrumentation and Measurement Technology Conference, Warsaw, Poland, pp. 1-6, 1-

3 May 2007.

[29] H. Hjalmarsson, J. Schoukens. On direct identification of physical parameters in non-

linear models. Proceedings of the IFAC Symposium on Nonlinear Control Systems,

Stuttgart, Germany, pp. 519-524, 1-3 September 2004.

[30] I. W. Hunter, M. J. Korenberg. The Identification of Nonlinear Biological Systems Wiener

and Hammerstein Cascade Models. Biological Cybernetics, vol. 55, no. 2-3, pp. 135-144,

1986.

[31] N. Jayant, J. Johnston, R. Safranek. Signal compression based on models of human

perception. Proceedings of the IEEE, vol. 81, no. 10, pp. 1385-1422, 1993.

[32] Z. P. Jiang, Y. Wang. Input-to-state stability for discrete-time nonlinear systems.

Automatica, vol. 37, no. 6, pp. 857-869, 2001.

[33] H. K. Khalil. Nonlinear Systems. Prentice Hall, Upper Saddle River, New Jersey, Second

Edition, 1996.

[34] D. E. Knuth. The Art of Computer Programming Vol. I. Addison-Wesley, pp. 70, 1968.

[35] Kreider, Kuller, Ostberg, Perkins. An Introduction to Linear Analysis. Addison-Wesley

Publishing Company Inc, 1966.

[36] K. Levenberg. A method for the solution of certain problems in least squares. Quart.

Appl. Math., vol. 2, pp. 164-168, 1944.

[37] I. J. Leontaritis, S. A. Billings. Input-output parametric models for non-linear systems.

Part I: deterministic non-linear systems. International Journal of Control, vol. 41, no. 2,

pp. 303-344, 1985.

[38] L. Ljung. System Identification: Theory for the User. Prentice Hall, Upper Saddle River,

New Jersey, Second Edition, 1999.

200

[39] L. Ljung, Q. Zhang, P. Lindskog, A. Juditski. Modeling a non-linear electric circuit with

black box and grey box models. Proceedings of the IFAC Symposium on Nonlinear

Control Systems, Stuttgart, Germany, pp. 543-548, 1-3 September 2004.

[40] E. N. Lorenz. Deterministic Nonperiodic Flow. Journal of the Atmospheric Sciences,

vol. 20, no. 2, pp. 130-141, 1963.

[41] O. Markusson. Model and System Inversion with Applications in Nonlinear System

Identification and Control. S3-Automatic Control, Royal Institute of Technology, 2002.

[42] D. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. SIAM

Journal of Applied Mathematics, vol. 11, pp. 431-441, 1963.

[43] V. J. Mathews, G. L. Sicuranza. Polynomial Signal Processing. A Wiley-Interscience

Publication, 2000.

[44] T. McKelvey, H. Akçay, L. Ljung. Subspace-based multivariable system identification from

frequency response data. IEEE Transactions on Automatic Control, vol. 41, no. 7,

pp. 960-979, 1996.

[45] T. McKelvey, A. Helmersson, T. Ribarits. Data driven Local Coordinates for multivariable

linear systems and their application to system identification. Automatica, vol. 40, no. 9,

pp. 1629-1635, 2004.

[46] R. R. Mohler, W. D. Kolodziej. An overview of bilinear system theory and applications.

IEEE Transactions on Systems, Man and Cybernetics, vol. 10, no. 10, pp. 683-688,

1980.

[47] S. Narendra Kumpati, K. Parthasarathy. Identification and control of dynamical systems

using neural networks. IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 4-27,

1990.

[48] J. G. Németh. Identification of Nonlinear Systems using Interpolated Volterra Models.

PhD Thesis, Vrije Universiteit Brussel, 2003.

[49] S. J. Norquay, A. Palazoglu, J. Romagnoli. Application of Wiener Model Predictive Control

to a pH Neutralization Experiment. IEEE Transactions on Control Systems Technology,

vol. 7, no. 4, pp. 437-445, 1999.

[50] A. V. Oppenheim, R. W. Schafer. Discrete-Time Signal Processing. Prentice-Hall, pp. 335-

373 and pp. 296, 1989.

[51] M. J. L. Orr. Radial Basis Function Networks Toolbox.

http://www.anc.ed.ac.uk/rbf/rbf.html, 1999.

201

[52] J. Paduart, G. Horvath, J. Schoukens. Fast identification of systems with nonlinear

feedback. Proceedings of the IFAC Symposium on Nonlinear Control Systems, Stuttgart,

Germany, pp. 525-530, 1-3 September 2004.

[53] R. Pintelon. Frequency-domain subspace system identification using non-parametric

noise models. Automatica, vol. 38, no. 8, pp. 1295-1311, 2002.

[54] R. Pintelon, Y. Rolain, W. Van Moer. Probability density function for frequency response

function measurements using periodic signals. IEEE Transactions on Instrumentation

and Measurement, vol. 52, no. 1, pp. 61-68, 2003.

[55] R. Pintelon, J. Schoukens. Measurement of frequency response functions using periodic

excitations, corrupted by correlated input/output errors. IEEE Transactions on

Instrumentation and Measurement, vol. 50, no. 6, pp. 1753–1760, 2001.

[56] R. Pintelon, J. Schoukens. System Identification. A Frequency domain approach. IEEE

Press, New Jersey, 2001.

[57] R. Pintelon, G. Vandersteen, L. De Locht, Y. Rolain, J. Schoukens. Experimental

Characterization of Operational Amplifiers: a System Identification Approach. IEEE

Transactions on Instrumentation and Measurement, vol. 53, no. 3, pp. 854- 876, 2004.

[58] S. Prakriya, D. Hatzinakos. Blind Identification of LTI-ZMNL-LTI Nonlinear Channel

Models. IEEE Transactions on Signal Processing, vol. 43, no. 12, pp. 3007-3013, 1995.

[59] W. J. Rugh. Nonlinear System Theory, The Volterra/Wiener Approach. The John Hopkins

University Press, 1981.

[60] M. Schetzen. The Volterra and Wiener Theories of Nonlinear Systems. Wiley, New York,

1980.

[61] J. Schoukens. Parameterestimatie in Lineaire en Niet-Lineaire Systemen met Behulp van

Digitale Tijdsdomein Metingen. PhD Thesis, Vrije Universiteit Brussel, 1985.

[62] J. Schoukens, T. Dobrowiecki, R. Pintelon. Identification of linear systems in the

presence of nonlinear distortions. A frequency domain approach. IEEE Transactions on

Automatic Control, vol. 43, no. 2, pp. 176-190, 1998.

[63] J. Schoukens, L. Gommé, W. Van Moer, Y. Rolain. Identification of a crystal detector

using a block structured nonlinear feedback model. Proceedings of the Instrumentation

and Measurement Technology Conference, Warsaw, Poland, pp. 1-6, 1-3 May 2007.

[64] J. Schoukens, R. Pintelon, T. Dobrowiecki, Y. Rolain. Identification of linear systems with

nonlinear distortions. Automatica, vol. 41, no. 3, pp. 491-504, 2005.

202

[65] J. Schoukens, Y. Rolain, R. Pintelon. Analysis of windowing/leakage effects in frequency

response function measurements. Automatica, vol. 42, no. 1, pp. 27-38, 2006.

[66] J. Schoukens, J. Swevers, R. Pintelon, H. Van der Auweraer. Excitation design for FRF

measurements in the presence of nonlinear distortions. Proceedings of the ISMA2002

Conference, Leuven, Belgium, vol. 2, pp. 951-958, 16-18 September 2002.

[67] J. Schoukens, G. Nemeth, P. Crama, Y. Rolain, R. Pintelon. Fast approximate

identification of nonlinear systems. Automatica, vol. 39, no. 7, pp. 1267-1274, 2003.

[68] J. Schoukens, R. Pintelon, J. Paduart, G. Vandersteen. Nonparametric Initial Estimates

for Wiener-Hammerstein Systems. Proceedings of the 14th IFAC Symposium on System

Identification, Newcastle, Australia, pp. 778-783, 29-31 March 2006.

[69] A. Schrempf. Identification of Extended State-Affine Systems. PhD Thesis, Johannes

Kepler Universität, 2004.

[70] J. Sjöberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P.-Y. Glorennec, H.

Hjalmarsson, A. Juditsky. Nonlinear black-box modeling in system identification: a

unified overview. Automatica, vol. 31, no. 12, pp. 1691-1724, 1995.

[71] J. Sjöberg. On estimation of nonlinear black box models: How to obtain a good

initialization. Proceedings of the 1997 IEEE Workshop Neural Networks for Signal

Processing VII, Amelia Island Plantation, Florida, pp. 72-81, 1997.

[72] K. Smolders, M. Witters, J. Swevers, P. Sas. Identification of a Nonlinear State Space

Model for Control using a Feature Space Transformation. Proceedings of the ISMA2006

Conference, Leuven, Belgium, pp. 3331-3342, 18-20 September 2006.

[73] E. D. Sontag. Realization theory of discrete-time nonlinear systems: Part I: The

bounded case. IEEE Transactions on Circuits and Systems, vol. 26, no. 5, pp. 342-356,

1979.

[74] E. D. Sontag. Smooth stabilization implies coprime factorization. IEEE Transactions on

Automatic Control, vol. 34, no. 4, pp. 435-443, 1989.

[75] T. Söderström, P. Stoica. System Identification. Prentice Hall, Englewood Cliffs, 1989.

[76] L. Sragner, J. Schoukens, G. Horvath. Modeling of slightly nonlinear systems: a neural

network approach. Proceedings of the IFAC Symposium on Nonlinear Control Systems,

Stuttgart, Germany, pp. 531-536, 1-3 September 2004.

203

[77] J. A. K. Suykens, B. De Moor, J. Vandewalle. Nonlinear System Identification using

neural state space models, applicable to robust control design. International Journal of

Control, vol. 62, no. 1, pp.129-152, 1995.

[78] J. A. K. Suykens, J. Vandewalle, B. De Moor. Artificial Neural Networks for Modelling and

Control of Non-Linear Systems. and their application to control. Kluwer Academic

Publishers, 1996.

[79] H. Unbehauen, G. P. Rao. A review of identification in continuous-time systems. Annual

Reviews in Control, vol. 22, pp. 145-171, 1998.

[80] V. Verdult. Nonlinear System Identification: A State-Space approach. PhD Thesis,

University of Twente, 2002.

[81] V. Verdult. Identification of local linear state-space models: the silver-box case study.

Proceedings of the IFAC Symposium on Nonlinear Control Systems, Stuttgart, Germany,

pp. 537-542, 1-3 September 2004.

[82] G. Vandersteen. Identification of linear and nonlinear systems in an errors-in-variables

least squares and total least squares framework. PhD Thesis, Vrije Universiteit Brussel,

1997.

[83] E. W. Weisstein. Multinomial Series. From MathWorld - A Wolfram Web Resource.

http://mathworld.wolfram.com/MultinomialSeries.html

[84] P. D. Welch. The use of fast Fourier transforms for the estimation of power spectra: A

method based on time averaging over short, modified periodograms. IEEE Transactions

on Audio and Electroacoustics, vol. 15, no. 2, pp. 70-73, 1967.

[85] E. Wernholt, S. Gunnarsson. Detection and Estimation of Nonlinear Distortions in

Industrial Robots. Proceedings of the IEEE Instrumentation and Measurement

Technology Conference, Sorrento, Italy, pp. 1913-1918, 24-27 April 2006.

[86] A. G. Wills, B. Ninness. On Gradient-Based Search for Multivariable System Estimates.

IEEE Transactions on Automatic Control, vol. 52, no. 12, pp. 1-8, 2007.

[87] http://en.wikipedia.org/wiki/KISS_principle

[88] http://lame.sourceforge.net

204

PUBLICATION LIST

Journal papers

• J. Paduart, J. Schoukens, Y. Rolain. Fast Measurement of Quantization Distortions in DSP

Algorithms. IEEE Transactions on Instrumentation and Measurement, vol. 56, no. 5,

pp. 1917-1923, 2007.

Conference papers

• J. Paduart, J. Schoukens. Fast Identification of systems with nonlinear feedback.

Proceedings of the 6th IFAC Symposium on Nonlinear Control Systems, Stuttgart, Germany,

pp. 525-529, 2004.

• J. Paduart, J. Schoukens, L. Gommé. On the Equivalence between some Block-oriented

Nonlinear Models and the Nonlinear Polynomial State Space Model. Proceedings of the

IEEE Instrumentation and Measurement Technology Conference, Warsaw, Poland, pp. 1-6,

2007.

• J. Paduart, J. Schoukens, R. Pintelon, T. Coen. Nonlinear State Space Modelling of

Multivariable Systems. Proceedings of the 14th IFAC Symposium on System Identification,

Newcastle, Australia, pp. 565-569, 2006.

• J. Paduart, J. Schoukens, K. Smolders, J. Swevers. Comparison of two different

nonlinear state-space identification algorithms. Proceedings of the International Conference

on Noise and Vibration Engineering, Leuven, Belgium, pp. 2777-2784, 2006.

• J. Schoukens, J. Swevers, J. Paduart, D. Vaes, K. Smolders, R. Pintelon. Initial estimates

for block structured nonlinear systems with feedback. Proceedings of the International

Symposium on Nonlinear Theory and its Applications, Brugge, Belgium, pp. 622-625, 2005.

205

• J. Schoukens, R. Pintelon, J. Paduart, G. Vandersteen. Nonparametric Initial Estimates

for Wiener-Hammerstein systems. Proceedings of the 14th IFAC Symposium on System

Identification, Newcastle, Australia, pp. 778-783, 2006.

• T. Coen, J. Paduart, J. Anthonis, J. Schoukens, J. De Baerdemaeker. Nonlinear system



identification of nonlinear systems using polynomial ... · 2.4.1 response to a sine wave 25 2.4.2...

Documents