parametric time domain modelling

7/30/2019 Parametric Time Domain Modelling

1/54

Mechanical Systems

and

Signal ProcessingMechanical Systems and Signal Processing 20 (2006) 763816

Invited SurveyParametric time-domain methods for non-stationary

random vibration modelling and analysis A critical

survey and comparison$

A.G. Poulimenos, S.D. Fassois

Stochastic Mechanical Systems & Automation (SMSA) Laboratory, Department of Mechanical & Aeronautical Engineering,

University of Patras, GR 265 00 Patras, Greece

Received 16 March 2005; received in revised form 13 October 2005; accepted 18 October 2005

Available online 27 December 2005

Abstract

A critical survey and comparison of parametric time-domain methods for non-stationary random vibration modelling

and analysis based upon a single vibration signal realization is presented. The considered methods are based upon time-

dependent autoregressive moving average (TARMA) representations, and may be classified as unstructured parameter

evolution, stochastic parameter evolution, and deterministic parameter evolution. The main methods within each class are

presented, and model structure selection is discussed. The methods are compared, via a Monte Carlo study, in terms of

achievable model parsimony, prediction accuracy, power spectral density and modal parameter accuracy and tracking,

computational simplicity, and ease of use. Comparisons with basic non-parametric methods are also made. The results of

the study confirm the advantages and high performance characteristics of parametric methods. They also confirm the

increased accuracy and performance characteristics of the deterministic, as well as stochastic, parameter evolution methods

over those of their unstructured parameter evolution counterparts.

r 2005 Elsevier Ltd. All rights reserved.

Keywords: Non-stationary vibration; Estimation and identification; Time series analysis; Vibration analysis; Time-varying systems; Modal

analysis; Spectral analysis; Time-frequency analysis; Time-dependent ARMA models

Contents

Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764Important conventions and symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765

2. Non-stationary signal representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768

2.1. Parameterized representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769

ARTICLE IN PRESS

www.elsevier.com/locate/jnlabr/ymssp

0888-3270/$- see front matterr 2005 Elsevier Ltd. All rights reserved.

doi:10.1016/j.ymssp.2005.10.003

$Research supported by the VolkswagenStiftung Grant no I/76938.Corresponding author. Tel./fax: +30 2610 997 405 (direct); 2610 997 130 (central).

E-mail address: [email protected] (S.D. Fassois).

URL: http://www.mech.upatras.gr/$sms.
http://www.elsevier.com/locate/jnlabr/ymssphttp://www.elsevier.com/locate/jnlabr/ymssp


2/54

2.1.1. Unstructured parameter evolution TARMA representations . . . . . . . . . . . . . . . . . . . . . . . . . 770

2.1.2. Stochastic parameter evolution TARMA representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771

2.1.3. Deterministic parameter evolution TARMA representations . . . . . . . . . . . . . . . . . . . . . . . . . 771

3. Non-stationary TARMA model identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772

3.1. Model parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773

3.1.1. Unstructured parameter evolution TARMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773

3.1.2. Stochastic parameter evolution TARMA models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7743.1.3. Deterministic parameter evolution TARMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778

3.2. Model structure selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783

3.2.1. Search schemes for locating the best fitness model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784

3.3. Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785

4. Model-based analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787

4.1. The impulse response function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787

4.2. The autocovariance function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788

4.3. Timefrequency distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788

5. Application of the methods to non-stationary vibration modelling and analysis. . . . . . . . . . . . . . . . . . . . . . . 789

5.1. The underlying system and the resulting non-stationary vibration. . . . . . . . . . . . . . . . . . . . . . . . . . . . 790

5.1.1. The theoretical non-stationary signal and the system properties . . . . . . . . . . . . . . . . . . . . . . . 791

5.1.2. Simulation and the resulting non-stationary vibration signal . . . . . . . . . . . . . . . . . . . . . . . . . 7915.2. The random vibration modelling problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791

5.3. Non-stationary vibration modelling results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793

5.3.1. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794

5.4. Model-based analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796

5.4.1. Frozen analysis: all estimated models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797

5.4.2. Further analysis: the FS-TARMA (2SLS-PE) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804

5.5. Summary of the results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810

6. Concluding remarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813

Appendix A.

List of symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813

A.1. Greek symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814

Acronyms

AIC Akaike information criterion PE prediction error (method)

AR autoregressive RELS recursive extended least squares (method)

ARMA autoregressive moving average RML recursive maximum likelihood (method)

BIC Bayesian information criterion RML-TARMA RML-estimated TARMA (model)

FS-TAR functional series TAR (model) RSS residual sum of squares

FS-TARMA functional series TARMA (model) SP-TARMA smoothness priors TARMA (model)

KF Kalman filter ST-ARMA short-time ARMA

NID normally independently distributed STFT short-time Fourier transform

MA moving average TAR time-dependent AR (model)

OLS ordinary least squares TARMA time-dependent ARMA (model)

P-A polynomial-algebraic (method) 2SLS two stage least squares (method)

Important conventions and symbols

A functional argument in parentheses designates function of a real variable; for instance x

t

is a function of

analog time t 2 R.

ARTICLE IN PRESS

A.G. Poulimenos, S.D. Fassois / Mechanical Systems and Signal Processing 20 (2006) 763816764


3/54

A functional argument in brackets designates function of an integer variable; for instance xt is a functionof normalized discrete time t 1; 2; . . .. The conversion from discrete normalized time to analog time isbased upon t 1Ts, with Ts standing for the sampling period.

A time instant used as superscript to a function indicates the set of values of the function up to that time

instant; for instance xt9fxi; i 1; 2; . . . ; tg.A hat designates estimator/estimate of the indicated quantity; for instance

^

h is an estimator/estimate of h.For simplicity of notation, no distinction is made between a random variable and its value(s).

B stands for the backshift operator defined such that Bk xt9xt k.

1. Introduction

Non-stationary random vibration is that characterized by time-dependent (evolutionary) characteristics (see,

for instance, [1, 2, Chapter 7, 3, Chapter 12, 4, pp. 211219, 5, Chapter 8, 6,7]). It is frequently encountered in

practice, with typical examples including earthquake-excited vibration, vibration of surface vehicles, airborne

structures, sea vessels, robotic devices, rotating machinery and so on.

From a physical standpoint, non-stationary vibration is due to time-dependent and/or inherently non-linear

dynamics. An example of such vibration is presented in Fig. 1. The structural system considered, consists of a

ARTICLE IN PRESS

Fig. 1. Laboratory non-stationary system: (a) schematic diagram of the laboratory setup, (b) the non-stationary random vibration signal,

and (c) its parametrically estimated time-dependent power spectral density [8,10].

A.G. Poulimenos, S.D. Fassois / Mechanical Systems and Signal Processing 20 (2006) 763816 765


4/54

steel beam, clamped close to its both ends on two vertical stands, and a steel cylindrical mass traversing it from

one end to the other (time-dependent mass distribution). The beam is subject to vertical stationary zero-mean

Gaussian random force excitation. Due to the motion of the mass, the resulting vertical vibration is

non-stationary, with variance non-stationarity being evident in the time-domain plot of Fig. 1(b). Its

parametrically estimated time-dependent power spectral density function is a function of both time and

frequency, and is depicted in Fig. 1(c). Non-stationarity is evident from this figure observe the timeevolution of the three spectral peaks appearing within the considered frequency range. For further details the

reader is referred to Refs. [810].

From a mathematical point of view, non-stationary random vibration is characterized by time-dependent

statistical moments. Confining attention to the first two moments which completely define the probability

distribution in the Gaussian case the mean (1st moment) and autocovariance (2nd moment) are of the

following respective forms:

mt Efxtg Z1

1xt fxt dxt, (1)

gt1; t2 Efxt1 xt2g Z1

1 Z1

1

xt1 xt2 fxt1; xt2 dxt1 dxt2 (2)

with xt designating the random vibration signal which is a function of analog time t (the argument inparentheses designating function of a real variable), Efg statistical expectation and f the probability densityfunction. Observe that, unlike in the stationary case, the mean is, in general, a function of time, and the

autocovariance a function of two considered time instants. It is oftentimes convenient to think of the

autocovariance as being of a local nature, relating values of the signal around a time instant t. In that case

g; is typically expressed asgt t=2; t t=2 Efxt t=2 xt t=2g

Z1

1

Z11

xt t=2 xt t=2 fxt t=2; xt t=2 dxt t=2 dxt t=23

with t t=29t1 and t t=29t2.The present survey focuses on Gaussian zero-mean random vibration with non-stationary and continually

evolving autocovariance function. The zero-mean assumption is adopted because in many random vibration

analysis problems the mean is either zero or constant (independent of time). In the latter case it may be (in an

initial stage) estimated and subsequently subtracted from the signal. The case of a time-dependent mean (also

referred to as a deterministic trend function) may be also treated in an initial stage via proper techniques, such

as curve fitting or high pass signal filtering [11, Section 8.3]. The continual evolution assumption for the

autocovariance function is adopted because this is the typical case in vibration analysis, where a continual

evolution of the dynamics is encountered.

The problem of non-stationary vibration modelling (identification) and analysis based upon equispaced in

time digital sampled vibration signal measurements xt, of finite length N (the use of brackets designatesfunction of an integer variable, presently discrete time t

1; 2; . . . ; N), obtained from a single experiment

(single realization) is the main theme considered. The general problem of modelling based upon signal

measurements is referred to as an identification problem (for instance see [12,13]) and in a broader sense is

a type of inverse problem [14]. An obtained model is to represent, in some sense, the underlying structural

dynamics and is, in general, of the form of an appropriate stochastic difference equation. It is tacitly assumed

that the measured signal actually obeys a similar (but unknown) mathematical expression, which reflects the

underlying structural dynamics and is subsequently referred to as the actual or true model or representation.

At this point it is useful to note that the terms model and representation are used interchangeably. Vibration

analysis then focuses on the extraction of physically meaningful information from the obtained model. This

may include the signal moments, but more often a form of time-dependent power spectral density function and

the time-dependent vibration modes [4, p. 218, 5, Chapter 8, 6,15,16]. In addition to analysis, an obtained

model may be also used for purposes of prediction, fault diagnosis, classification and control (though these

issues are not discussed in the present paper; the interested reader is referred to references such as [1723]).

ARTICLE IN PRESS



5/54

Non-stationary random vibration modelling and analysis based upon available signal measurements has

received significant attention in recent years; see, for instance, Refs. [6,7,15,16,19,2435]. The available

methods may be broadly classified as parametric or non-parametric.

Non-parametric methods have received most of the attention thus far, and are based upon non-

parameterized representations of the vibration energy as a simultaneous function of time and frequency (time-

frequency distributions). These methods include the classical, though largely empirical, spectrogram (basedupon the short-time Fourier transform STFT) and its ramifications [4, p. 218, 6, 3, p. 504, 30], Marks

physical spectrum [36, 5, Section 8.3], the Cohen class of distributions [6,19,31,3739], Priestleys evolutionary

spectrum [1, Section 6.3, 5, Section 8.4], as well as wavelet-based methods [4, Chapter 17, 25,15,28,22].

Parametric methods are, on the other hand, based upon parameterized representations of the time-

dependent autoregressive moving average (TARMA) or related types and their extensions (for instance

TARMAX models that is TARMA models with exogenous excitations which are used in accounting for

measurable excitations causing the observed vibration response). These representations differ from their

conventional, stationary, counterparts in that their parameters are time-dependent (for instance, see

[4042,16,29,34]). The methods based upon them are known to offer a number of potential advantages,

such as (for instance see [20,27,16,29,33,43,8,23,35] ): (i) representation parsimony, as models may be

potentially specified by a limited number of parameters; (ii) improved accuracy; (iii) improved resolution;

(iv) improved tracking of the time-varying dynamics; (v) flexibility in analysis, as parametric methods arecapable of directly capturing the underlying structural dynamics responsible for the non-stationary

behaviour; (vi) flexibility in synthesis (simulation) and prediction, as they are more suitable for both purposes;

(vii) flexibility in fault diagnosis, as they allow for the use of the broad class of parametric diagnosis

techniques; and, (viii) flexibility in control, for which they are also particularly suitable.

Parametric methods may be further classified according to the type of structure imposed upon the

evolution of the time-varying model parameters. The main resulting classes are, in the order of increasing

structure imposed upon the time-varying parameters, as follows:

(a) The class of unstructured parameter evolution methods, which impose no particular structure upon the

evolution of the time-varying parameters. Prime methods within this class include the short-time ARMA

(ST-ARMA) method (for instance see [44, pp. 7982, 45,29,32]) and the group of recursive methods,including the recursive maximum likelihood (RML) method (for instance see [46,42,47,35], and in a more

general context [48,49, 13, Chapter 9, 12, Chapter 11] recursive methods are also discussed and

compared in Refs. [50,51]). For alternative methods see [26].

(b) The class of stochastic parameter evolution methods, which impose stochastic structure upon the

evolution of the time-varying parameters via stochastic smoothness constraints. Methods in this class have

been used primarily for the modelling and analysis of earthquake ground motion signals (for instance in

references [5254,40,24,7]).

(c) The class of deterministic parameter evolution methods which impose deterministic structure upon the

evolution of the time-varying parameters. These methods are of the functional series TAR and TARMA

(FS-TAR and FS-TARMA) types, and represent the evolution of the model parameters by deterministic

functions belonging to specific functional subspaces [44, Chapter 6]. FS-TAR methods have been used for

the modelling and simulation of earthquake ground motion [55] and vibration analysis in rotating

machinery [27] (in a broader context they were pioneered in Refs. [5658] and later in references such as

[5962, 44, Chapter 6, 63,64]). FS-TARMA methods have been broadly developed in references [6568],

and, among others, applied to the modelling and prediction of power consumption in an automobile active

suspension [20], the modelling and analysis of simulated robot vibration [16], the modelling, analysis and

simulation of earthquake ground motion [33], and the modelling and vibration analysis of the bridge-

like laboratory structure of Fig. 1 [810].

Unstructured parameter evolution methods are characterized by low parsimony (model parametrization

economy), as the complete description of a non-stationary signal requires knowledge of the model

parameters at each time instant. They are mainly capable of tracking slow evolutions in the dynamics.

Stochastic parameter evolution methods achieve low parsimony as well, as knowledge of the model

ARTICLE IN PRESS



6/54

parameters at each time instant is also required. At the same time, they may still leave an unnecessarily high

number of degrees of freedom in the parameter evolution. They are mainly capable of tracking slow and

medium evolutions in the dynamics. Deterministic parameter evolution methods achieve high parsimony, as

they use a limited number of parameters for the complete signal description (they basically provide a global

representation of parameter evolution). Through proper selection of their functional subspaces, they are

capable of tracking fast or slow evolutions in the dynamics (see the comments in [44, p. 215]). A summary of

these features is, for the various families of parametric methods, provided in Table 1.

The aim of the present paper is two-fold: (1) a critical overview of parametric time-domain methods for non-stationary random vibration modelling and analysis, and (2) the application and comparative assessment of

the methods, via Monte Carlo experiments, to a simulated non-stationary random vibration signal with

precisely known characteristics.

The various facets, capabilities, and limitations of the methods are examined, while certain new results are

also presented. Particular emphasis is placed upon the assessment of the following characteristics: (a) model

parsimony (representation simplicity), (b) achievable modelling accuracy in terms of model-based one-step-

ahead predictions, (c) achievable time-dependent power spectral density and modal parameter accuracy,

resolution and tracking, (d) computational simplicity, and (e) ease of use.

The rest of the paper is organized as follows: non-stationary signal representations are presented in

Section 2, where certain non-parameterized representations are also mentioned. The various parameterized

representations are, along with the corresponding families of modelling/estimation methods, reviewed in

Section 3. Model based vibration analysis, mainly in terms of the time-dependent power spectral densityfunction and the modal parameters, is discussed in Section 4. The application of the methods to a simulated

non-stationary random vibration signal, and their comparative assessment via a Monte Carlo study, are

presented in Section 5. Finally, concluding remarks are summarized in Section 6.

2. Non-stationary signal representations

Non-stationary signal representations may be of the parameterizedor non-parameterizedtype. Parameterized

representations attempt to model a non-stationary signal via a description that may be specified via a more or

less limited number of parameters. As already indicated, they are based upon conceptual extensions of the

ARMA representations of the stationary case [17, pp. 138141, 18, p. 77], and, being in the focus of this paper,

are presented in detail in the next subsection.

Non-parameterized representations may be based upon the impulse response (or weighting) function, the

autocovariance function, or representations aiming at describing the signals power spectral density (signals

power) over its frequency range at every time instant; this defining the concept of a time frequency

distribution. Timefrequency representations are, by far, the most widely used today. A few of them are, for

purposes of completeness and comparison, briefly presented in the sequel (also see the discussion of Section 4

on the computation of non-parameterized representations based upon their estimated parameterized

counterparts). For a detailed overview of non-parametric methods the interested reader is referred to [6]

and the references therein.

Timefrequency representations may be based upon the concept of local autocovariance (Eq. (3)) and

the conventional definition of the (stationary) power spectral density as the Fourier transform of the

autocovariance function (for instance see [1, p. 5, 5, p. 49]).

ARTICLE IN PRESS

Table 1

Main characteristics of parametric non-stationary methods

Parameter evolution Representation parsimony Dynamics evolution

Unstructured Low Slow

Stochastic Low Slow/medium

Deterministic High Slow/medium/fast



7/54

It is often useful to introduce the concepts by first presenting the spectrogram. This is based upon the STFT

of the signal and constitutes one of the simplest and most widely used, though largely empirical, forms. It is

obtained by applying the Fourier transform on a window sliding along the vibration signal and thus

selecting a segment of it at each time. The spectrogram of the non-stationary signal xt (note that thecontinuous time domain is presently used) is thus expressed as [3, p. 504, 6]:

Sspgo; t Z1

1wt t0 xt0 ejot0 dt0

2 (4)with o designating frequency in rad/time unit, j the imaginary unit, w the selected window, and j j complexmagnitude. Note that, strictly speaking, the spectrogram is not a signal representation, but rather an estimator

of some (though not clearly defined) underlying quantity.

A formal representation is provided by the WignerVille distribution, according to which the timefrequency

distribution is defined as the Fourier transform, with respect to t, of the local autocovariance function

gt t=2; t t=2 (the time t treated as fixed) [1, pp. 143, 159, 5, pp. 145146, 37, p. 114, 6, 3, p. 504]

SWVo; t Z1

1

gt t=2; t t=2 ejot dt. (5)

A unified framework for the various such representations has been established by Cohen [37,6], and the

timefrequency distributions within it are referred to as the Cohen class of distributions. Their general form is

as follows:

SCo; t Z1

1~gt t=2; t t=2 ejot dt

Z11

Z11

gu t; t gu t=2; u t=2 ejot du dt (6)

with ~gt t=2; t t=2 designating a generalized local autocovariance which takes into account values of theautocovariance at neighbouring time instants but in a properly weighted fashion that still concentrates

attention to the current time instant t. g; is the weighting (kernel) function. Note that this kernel functionmay also vary with the lag t. It should be further noted that the previously mentioned timefrequency

distributions are members of the Cohen class. Indeed, withstanding the statistical expectation operator, the

spectrogram is obtained by selecting gt; t wt t2 wt t

2, and the WignerVille distribution is obtained

by selecting gt; t gt dt, with dt designating the Dirac delta function.

2.1. Parameterized representations

Parameterized representations typically are of the TARMA type or proper extensions (for instance

TARMAX representations that is TARMA representations with eXogenous excitations, which

additionally account for measurable excitations [69,34,23,9]). These representations resemble their

conventional, stationary, ARMA counterparts, with the significant difference being that they allow their

parameters to depend upon time [40,41,44,34]. A TARMAna; nc model, with na; nc designating itsautoregressive (AR) and moving average (MA) orders, respectively, is thus of the form (note that normalizeddiscrete time is henceforth used):

xt Xnai1

ait xt i|fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl}AR part

et Xnci1

cit et i;|fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl}MA part

et$NID0; s2e t (7)

with t designating normalized discrete time (absolute time normalized by the sampling period), xt the non-stationary signal to be modelled, et an (unobservable) uncorrelated (white) innovations sequence with zeromean and time-dependent variance s2e t that generates xt, and ait, cit the models time-dependent ARand MA parameters, respectively. NID; stands for normally independently distributed with the indicatedmean and variance. This TARMA form is often referred to as shifted, and is typically defined within a specific

time interval, say to; tf.

ARTICLE IN PRESS



8/54

It is straightforward to verify that the minimum mean square error (MMSE) one-step-ahead prediction

xt=t 1 of the signal value xt made at time t 1 (that is for given values of the signal up to time t 1) is(note that the hat more generally designates estimator/estimate of the indicated quantity):

xt=t 1 Xna

i1

ait xt i Xnc

i1

cit et i. (8)

Comparing this with the TARMA model of Eq. (7) one obtains that the one-step-ahead prediction error is

equal to et, that iset=t 19xt xt=t 1 et. (9)

This is an important observation, as it indicates that, just as in the stationary case, the models one-step-ahead

prediction error (also referred to as the residual) coincides with the (uncorrelated) innovations generating the

signal. This is, of course, valid as long as the true model parameters of Eq. (7) are used in the predictor

equation (8).

Using the backshift operator B Bi xt9xt i, the TARMA representation of Eq. (7) is compactlyre-written as

xt Xnai1

ait Bi xt et Xnci1

cit Bi et () AB; t xt CB; t et; et$NID0;s2e t

10awith

AB; t9 1 Xnai1

ait Bi; CB; t9 1 Xnci1

cit Bi. (10b)

The polynomials AB; t and CB; t are referred to as the AR and MA time-dependent polynomial operators,respectively. Also note that one may also define a0t c0t 1.

It is, at this point, important to observe that, unlike in the stationary case, the backshift operator has toobey a non-commutative (skew) multiplication operation defined as [70]

Bi Bj Bij; Bi xt xt i Bi. (11)

The reason for the introduction of this skew multiplication operation may be explained as follows. Consider,

as a simple example, a two-term polynomial product of the form 1 p1t B 1 p2t B. Applied, fromthe left, to any signal xt this operator has to satisfy the property

f1 p1tB 1 p2tBgxt 1 p1tBf1 p2tBxtg 8 xt() f1 p1t p2tB p1tB p2tBgxt

xt p1t p2t xt 1 p1tp2t 1 xt 2 8 xt.Obviously, this is valid if and only if the multiplication operation satisfies the properties of Eq. (11).

As already indicated, depending upon the structure imposed upon the time evolution of their parameters,

TARMA representations may be classified as unstructured parameter evolution, stochastic parameter

evolution, and deterministic parameter evolution.

2.1.1. Unstructured parameter evolution TARMA representations

Unstructured parameter evolution TARMA representations impose no structure upon the time evolution

of their parameters, which are thus free to change with time. Such a representation is thus directly

parameterized in terms of its time-dependent parameters ait; cit;s2e t 8t, while a specific model structure, sayM, is defined by the representation orders na; nc, that is:

M9fna; ncg. (12)

ARTICLE IN PRESS



9/54

2.1.2. Stochastic parameter evolution TARMA representations

Stochastic parameter evolution TARMA representations impose stochastic structure upon the time

evolution of their parameters. The latter are thus considered as being random variables allowed to change with

time, but with their evolution being subject to stochastic smoothness constraints. In a way, these reflect our

prior knowledge regarding the evolution of the underlying dynamics. The constraints are often referred to as

smoothness priors constraints (for instance see [5254,40,24,7]), and the models are referred to as smoothnesspriors TARMA (SP-TARMA) models, or, sometimes, also as doubly stochastic models.

The smoothness priors constraints constitute stochastic difference equations of the following typical forms [7]:

Dkait wait; Dkcit wcit (13)

acting on each one of the AR and MA parameters (ait and cit, respectively). In the above expressions kdesignates the difference equation order, Dk the kth order difference operator [D91 B, so thatDat at at 1; Dk91 Bk] and wait; wcit zero-mean, uncorrelated (white), and also mutuallyuncorrelated and uncrosscorrelated with et, Gaussian sequences with (possibly) time-dependent variances(s2wai

t, s2wcit). It is worth observing that the stochastic difference equations (13) represent integrated stochasticmodels characterized by AR roots on the unit circle, and thus describe homogeneously non-stationary evolutions

[18, pp. 9296]. The degree of smoothness of each such evolution is controlled by the corresponding white noise

variance, and increases for decreasing variance.A specific SP-TARMA model structure is defined by the model orders na; nc and the smoothness constraints

order k, with the latter being typically assumed to be common for all AR and MA parameters. Hence

MSP9fna; nc;kg. (14)

2.1.3. Deterministic parameter evolution TARMA representations

Deterministic parameter evolution TARMA representations impose deterministic structure upon

the time evolution of their parameters. This is achieved by postulating model parameters as deterministic

functions of time, belonging to specific functional subspaces (for instance see [41,67,27,44,68,34]).

Such representations are often referred to as FS-TARMA representations (models). Their AR and MA

parameters, as well as their innovations variance, are all expanded within properly selected functionalsubspaces defined as:

FAR9fGba1t; Gba2t; . . . ; Gbapatg,

FMA9fGbc1t; Gbc2t; . . . ; Gbcpctg,

Fs2e9fGbs1t; Gbs2t; . . . ; Gbspstg.

In these expressions F designates functional subspace of the indicating quantity and Gjt a set oforthogonal basis functions selected from a suitable family (such as Chebyshev, Legendre, other polynomial,

trigonometric, or other functions). The AR, MA and variance subspace dimensionalities are indicated as pa,

pc, ps, respectively, while the indices ba

i

i

1; . . . ;pa

, bc

i

i

1; . . . ;pc

and bs

i

i

1; . . . ;ps

designate

the specific basis functions of a particular family that are included in each subspace.The time-dependent AR and MA parameters and the innovations variance of an FS-TARMAna; ncpa ;pc;ps

representation may be thus expressed as

ait9Xpaj1

ai; j Gbajt; cit9Xpcj1

ci; j Gbc jt; s2e t9Xpsj1

sj Gbsjt (15)

with ai; j, ci; j and sj designating the AR, MA and innovations variance, respectively, coefficients of projection.

An FS-TARMA model is thus parameterized in terms of its projection coefficients ai; j; ci; j; sj, while aspecific model structure, say MFS, is defined by the model orders na; nc, and the functional subspacesFAR;FMA;Fs2e

:

MFS9fna; nc;FAR;FMA;Fs2e g. (16)

ARTICLE IN PRESS



10/54

It is noted that the FS-TARMA model class constitutes an attractive choice for the modelling of non-

stationary random vibration signals, as the underlying dynamics are, in many cases, evolving in a deterministic

and smooth way. Through proper selection of the functional subspaces, FS-TARMA models may model

various types of evolution in the dynamics, including slow or fast, continuous or discontinuous evolution; for

instance see Refs. [44, p. 215,20,27,16,33,8,10].

3. Non-stationary TARMA model identification

Given a single, N-sample long, non-stationary signal record (realization) xN9fx1 . . . xNg and a selectedrepresentation class (unstructured, stochastic, or deterministic parameter evolution), the TARMA identification

problem may be stated as the problem of selecting the corresponding model structure, the model AR and MA

parameters ait and cit, respectively, and the innovations variance s2e t that best fit the availablemeasurements. Model fitness may be understood in various ways, a common approach being in terms of

predictive ability. This implies that the best model is the one characterized by minimal (one-step-ahead)

prediction error. The methods based upon this principle minimize a function of the prediction error sequence

(typically the residual sum of squares (RSS)) and are referred to as prediction error methods (PEM) [12, p. 199].

The model structure includes the AR and MA orders na and nc, respectively, and, possibly, additional

structural parameters, depending upon the particular representation class considered (see Eqs. (12), (14)

and (16) which define the model structure M=MSP=MFS for each class).More formally, the identification problem may be defined as the selection of the best fitting (predicting)

model from the set G of TARMA models corresponding to a particular class:

G9 MhN; s2eN : xt Xnai1

ait; ht xt i et; ht Xnci1

cit; ht et i; hti;(

s2e t; ht Efe2t; htg; t 1; . . . ; N)

. 17

In this expression ht designates the instantaneous AR/MA parameter vector (notice that lower/upper casebold face symbols designate column-vector/matrix quantities, respectively; transposition is designated by the

superscript T):

ht9a1t . . . ana t ...

c1t . . . cnc tTnanc1, (18)

while ht stands for all AR and MA parameters up to time t, that is

ht9hT1 hT2 . . . hTtTnanct1. (19)

Obviously, hN designates all AR/MA parameters at all time instants, and, similarly,

s2e

N the residual variance

at all time instants:

s2eN9fs2e 1; s2e 2; . . . ;s2e Ng. (20)Observe that the signal model to be estimated in the above equation is, for purposes of clarity, shown as

explicitly parameterized in terms of the specific parameters to be estimated. Thus, both the AR/MA parameters

and the one-step-ahead prediction error (residual) signal are designated as functions of these parameters. For the

residual signal, in particular, this signifies the fact that it is obtained based upon the current model parameters and

the available vibration signal xN [using the TARMA model expression in Eq. (17)]. Thus, the best model may be

selected/estimated as the model with parameters such that the prediction error signal is minimal.

For purposes of practicality and conceptual simplicity, the model identification problem is usually

distinguished into two subproblems: (a) the parameter estimation subproblem and (b) the model structure

selection subproblem. These are separately treated in Sections 3.1 and 3.2, respectively.

ARTICLE IN PRESS



11/54

3.1. Model parameter estimation

Model parameter estimation refers to the determination, for a given model form and structure, of the

AR/MA parameter vector ht and the residual variance s2e t.

3.1.1. Unstructured parameter evolution TARMA modelsThe estimation of unstructured parameter evolution TARMA models may be achieved by either one of the

following two methods:

3.1.1.1. Short-time ARMA (ST-ARMA) estimation (for instance see [44, pp. 79 82, 45,29,32]). This

method simply postulates conventional, stationary, ARMA modelling to successive short time segments

(of short duration, say Msamples), within which the signal may be considered as approximately stationary.

It is thus also referred to as a locally stationary method, and produces parameter estimates that remain

constant within each segment (non-overlapping segment case). In the overlapping segment case (where the

active segment is forwarded by a specified forward step, say m), parameter estimates are obtained every m

samples.

Stationary ARMA models are of the form of Eq. (7) but with parameters and innovations variance being

independent of time. Their estimation may be based upon well known methods, such as the prediction error(PE) method, multistage methods, and so on [13,18,12].

The critical quantity in this method is the segment duration (length) M. It is evident that a short duration

may lead to crude (inaccurate) parameter estimates, whereas a long duration may not provide sufficient

time resolution for adequately describing the evolution in the dynamics. Thus a compromise between

achievable accuracy and time resolution is necessary. For this reason, the method is basically suitable for cases

where the evolution in the dynamics is slow. The use of overlapping segments is generally useful, yet the

segment duration remains as the critical compromising factor. An additional limiting factor in this method is

the fact that the stationary ARMA models are typically estimated as invertible (that is with stable MA

polynomial). This constraint may cause problems and have detrimental effects on the achievable accuracy (see

Section 5).

3.1.1.2. Recursive TARMA estimation (for instance see [48,13, Chapter 9, 50,51, 12, Chapter 11,42]). Recursive,

or else adaptive, TARMA methods have a rich history in non-stationary signal modelling. The key idea is the

formulation of an estimator of the AR/MA parameter vector ht at each time instant t based upon the dataavailable until that time in a form that is recursively updated at the next time instant t 1 that the next signalsample xt 1 is processed. For this reason TARMA models estimated recursively are sometimes referred to asrecursive ARMA (RARMA) models.

Although there are several variations of recursive methods, attention is presently focused on a method based

upon the following exponentially weighted prediction error criterion [12, pp. 363, 368, 13, p. 324]:

ht arg minht X

t

t

1

ltt e2t; ht1 (21)

with

et; ht19xt Xnai1

ait 1 xt i Xnci1

cit 1 et i; hti1 % et; ht.

In these expressions arg min designates minimizing argument, and et; ht1 the models one-step-aheadprediction error (residual) made at time t 1 without knowing the model parameter values at time t as itwould be normally necessary. Of course, as indicated in the expression above, et; ht1 % et; ht for slowparameter evolution. The term ltt is a window or weighting function that, for l 2 0; 1, assigns more weightto more recent errors. It may be also rewritten as follows [12, pp. 378379]:

ltt

eln l

tt

ett

ln l

% ett

1l.

ARTICLE IN PRESS



12/54

The quantity 1=1 l is referred to as the memory time constant, while l is referred to as the forgetting factor.The smaller the value ofl, the faster older values of the error (and thus the signal) are forgotten, thus increasing

the estimators adaptability (its ability to track the evolution in the dynamics). Yet, at the same time, the accuracy

of the estimator decreases, as its covariance increases [12, pp. 381382]. Therefore, the selection ofl is critical and

represents the basic tradeoff between tracking ability in the dynamics and achievable parameter accuracy.

For this reason recursive TARMA methods are mainly suitable for slow evolution in the dynamics, inwhich case adequate tracking may be achieved by a value of l relatively close to unity. In any case, the value of

l may be optimized (estimated) based upon a suitable criterion, such as minimization of the residual

(prediction error) sum of squares (RSS).

The recursive estimation ofht based upon the above criterion is accomplished via the recursive maximumlikelihood (RML) method which may be summarized as follows [12, p. 372]:

Estimator update:

ht ht 1 kt etjt 1. (22a)Prediction error:

etjt 1 xt xtjt 1 xt /Tt ht 1. (22b)Gain:

kt Pt 1 wtl wTt Pt 1 wt . (22c)

Covariance update:

Pt 1l

Pt 1 Pt 1 wt wTt Pt 1

l wTt Pt 1 wt

. (22d)

Filtering:

wt c1t 1 wt 1 cnc t 1 wt nc9/t, (22e)

/t9xt 1 . . . xt na ...

et 1jt 1 . . . et ncjt ncT. (22f)

A posteriori error:etjt xt /Tt ht (22g)

with xtjt 1 indicating one-step-ahead prediction of the signal at time t made at time t 1, and etjt 1 et; ht1 the corresponding prediction error. kt stands for the adaptation gain, and Pt for the modelparameter covariance matrix. This method is henceforth referred to as the RML-TARMA method.

For the initialization of the method it is customary to set h0 0, P0 aI (where a stands for a largepositive number and I the unity matrix), and the initial signal and a posteriori error values to zero. A simple

method that may be used to reduce the effects of the arbitrary initial conditions is to apply the recursions

on the available signal in sequential phases (for instance a forward pass, a backward pass and a final forward pass).

Notice that the filtering operation of Eq. (22e) requires stability of the MA polynomial operator. Such a

constraint is typically incorporated into the algorithm, and unstable (in the instantaneous sense) roots of the

estimated MA polynomial are stabilized (reflected, in a way that preserves the frozen spectrum, withinthe unit circle). Yet, this may have detrimental effects on the achievable accuracy (see Section 5).

The innovations (one-step-ahead prediction error) variance s2e t may be estimated via a window of length,say, 2M 1, centred at the time instant t, that slides over the prediction error (residual) sequence (non-causalmoving average filter), that is:

s2e t 1

2M 1XtM

ttMe2tjt 1. (23)

3.1.2. Stochastic parameter evolution TARMA models [52 54,40,24,7,44, Chapter 7]

Model parameter estimation for TARMA models subject to the stochastic smoothness constraints (SP-

TARMA models) of Eq. (13) may be developed by setting the latter, along with the TARMA representation

expression, into linear state space form. The estimation of such a model then constitutes a tradeoff between

ARTICLE IN PRESS



13/54

infidelity to the data set xN and infidelity to the kth order difference equation constraints. Like in the previous

case, parameter estimation is recursive.

To illustrate the ideas, consider, for instance, the TARna case and a second order k 2 stochasticsmoothness constraint:

In a similar manner, it may be shown that the general (k-th order) smoothness constraint of Eq. (13) may be

expressed as:

zt F zt 1 G wt (24a)with

zt9a1t . . . ana t ...

. . . ...

a1t k 1 . . . ana t k 1Tkna1 (24b)

wt9wa1 t wa2 t . . . wana tTna1 (24c)and

(24d)

and so on, where In and 0n designate the n n dimensional identity and zero matrices, respectively. Asindicated by the above expressions, zt forms a state vector (see Eq. (24a)), whereas wt consists of the scalarinnovations entering in each constraint expression.

On the other hand, the TARna representation may be expressed as

xt hT

t zt et (25a)

ARTICLE IN PRESS



14/54

with

ht9xt 1 . . . xt na ...

0 . . . 0Tkna1 (25b)Therefore, based upon Eqs. (24a) and (25a), the smoothness priors TARna representation may be

completely expressed by the state space form [7]

zt F zt 1 G wt; xt hTt zt et, (26a)

(26b)

Note that in the above expressions NID; stands for normally independently distributed with the indicatedmean and covariance, while Qt9EfwtwTtg designates the covariance matrix of the constraint modelinnovations vector wt.

In view of the above, and the definition of the state vector zt (Eq. (24b)), estimation of the modelparameters at each time instant corresponds to estimation of the state vector given the measured signal. This

may be achieved via the Kalman filter (KF) approach as follows [54,53]:

Time update (prediction):

State prediction:

ztjt 1 F zt 1jt 1.Prediction error:

etjt 1 xt hTt ztjt 1.Covariance prediction:Ptjt 1 F Pt 1jt 1 FT G Qt GT. 27a

Observation update (filtering):

Gain:

kt Ptjt 1 ht hTt Ptjt 1 ht s2e t1.State update:

ztjt ztjt 1 kt etjt 1.Covariance update:

Ptjt

Ikt hT

t

Ptjt

1.

27b

In these expressions the theoretical innovations (prediction error, residual) variance s2e t is assumedto be a priori known. Nevertheless, s2e t is normally unavailable. This problem may be tackledby rewriting the gain and the covariance prediction/update equations in the following normalized

form [44, p. 230]:

kt Ptjt 1s2e t

ht hTt Ptjt 1s2e t

ht 1 1

,

Ptjt 1s2et

F Pt 1jt 1s2e t

FT G Qts2e t

GT,Ptjts2e t

I

k

t

hT

t

Ptjt 1s2e t

,

ARTICLE IN PRESS



15/54

which by assuming that the innovations variance is slowly varying such that Pt1jt1s2e t

% Pt1jt1s2e t1

, allows for

rewriting the Kalman filter equations (27a,b) into the following normalized form:

Time update (prediction):

State prediction:

ztjt 1 F zt 1jt 1.Prediction error:

etjt 1 xt hTt ztjt 1.Covariance prediction:

~Ptjt 1 F ~Pt 1jt 1 FT G ~Qt GT. 28aObservation update (filtering):

Gain:

kt ~Ptjt 1 ht hTt ~Ptjt 1 ht 11.State update:ztjt ztjt 1 kt etjt 1.Covariance update:

~Ptjt I kt hTt ~Ptjt 1, 28b

~Ptjt9Ptjts2e t

; ~Ptjt 19Ptjt 1s2e t

; ~Qt9 Qts2e t

s2wts2e t|ffl{zffl}nt

Ina (28c)

with ~Pj designating the normalized state covariance matrix, ~Qt the normalized constraint modelcovariance matrix, and n

t

the ratio of the constraint model innovations variance over the residual variance.

This ratio, which is for simplicity assumed to be constant nt n in the rest of the paper, constitutes a userselected design parameter that controls the equivalent memory of the estimation algorithm (similar to the

forgetting factor in the recursive TARMA estimation of Eq. (21)) [44, pp. 232, 237242]. Of course, it is

possible to optimize (estimate) n based upon a suitable criterion (maximization of the likelihood or

minimization of the residual sum of squares) [52]. For initialization of the Kalman Filter, typical selections are

z0j0 0 and ~P0j0 aI, with a being a large positive constant and I the unity matrix. This method ishenceforth referred to as the SP-TARMA method.

The smoothed estimate option: Once the prediction and filtering operations have been performed, a smoothed

estimate ztjN of the state vector given the entire data set xN may be optionally obtained via the followingbackward smoothing algorithm [53,54]:

Smoothing:

At ~Ptjt FT ~P1t 1jt,ztjN ztjt At zt 1jN zt 1jt ,~PtjN ~Ptjt At ~Pt 1jN ~Pt 1jt

ATt. 29This method is henceforth referred to as the SP-TARMA method with posterior smoothing (SP-TARMA

(smoothed)) method.

Innovations variance estimation: This may be achieved either via the scheme described in reference [54], or

via that of the previous (RML-TARMA) method.

The TARMA case: In the complete TARMA case, the second of Eqs. (26a) is a non-linear function ofzt:

xt hT

t; zt

1

zt et (30a)

ARTICLE IN PRESS



16/54

with

ht; zt19xt 1 . . . xt na ...

et 1; zt1 . . . et nc; ztnc ...

0 . . . 0Tknanc1, (30b)

z

t

9

a1

t

. . . ana

t

c1

t

. . . cnc

t

...

. . . ...

a1

t

k

1

. . . ana

t

k

1

c1t k 1 . . . cnc t k 1Tknanc1 30cand zt designating a vector containing all state vectors zt up to time t. SP-TARMA parameterestimation may be then based upon the extended Kalman filter (EKF) algorithm [71, pp. 284285].

Nevertheless, experience has shown that in many circumstances the EKF algorithm does not work well in

practice [71, p. 284]. An alternative, simple, possibility is to use an extended least squares (ELS)-like

algorithm, by replacing the theoretical prediction errors et; zt in Eq. (30b) with their respective posteriorestimates etjt (which are then treated as measurements for this type of approximation for instance see [44,p. 263]). SP-TARMA parameter estimation may be then achieved via the ordinary Kalman Filter algorithm of

Eqs. (28a,b) with

ht xt 1 . . . xt na..

.

et 1jt 1 . . . et ncjt nc..

.

0 . . . 0T

knanc1, (31)

etjt xt hTt ztjt.

3.1.3. Deterministic parameter evolution TARMA models

The problem of parameter estimation for deterministic parameter evolution TARMA models, that is FS-

TARMA models, consists of determining the AR/MA and innovations variance projection coefficient vectors

! and s, respectively:

!9aT ... cTTnapancpc1; s9s1 . . . sps T

ps1, (32a)

where a and c represent the corresponding AR and MA projection coefficient vectors:

a9a1;1 . . . a1;pa ...

. . . ...

ana;1 . . . ana;pa Tnapa1, (32b)

c9c1;1 . . . c1;pc j . . . jcnc;1 . . . cnc;pc Tncpc1. (32c)With this notation, the general TARMA model of Eq. (17) may be specifically re-written in the compact

form (compare with Eq. (10a)):

AB; t; a xt CB; t; c et;!; Efe2t; !g s2e t;!. (33)Estimation of the parameter vector ! may be based upon a prediction error (PE) criterion consisting of the

sum of squares of the models one-step-ahead prediction errors (residual sum of squares, RSS), that is (for

instance see [41,66,67])

! arg min!

XNt1

e2t;!, (34)

with arg min designating minimizing argument.

In the pure autoregressive (TAR) case the model is of the form CB; t; c 1 in Eq. (33))

AB; t; a xt et; a ) xt Xnai1

Xpaj1

ai; j Gbajt xt i et; a() xt /TAt a et; a 35

with

/At9Gba1t xt 1 . . . Gbapat xt naT

napa1.

ARTICLE IN PRESS



17/54

Since the residual et; a depends linearly upon the parameter vector a, minimization of the PE criterion ofEq. (34) leads to the ordinary least squares (OLS) estimator

a 1N

XN

t1/At /TAt

1 1

N

XN

t1/At xt

. (36)

The estimation of the innovations variance projection coefficients may be achieved by the following

procedure. An initial estimate of the estimated residual series et; a variance is first obtained via a non-causalmoving average filter (using a sliding time window) as follows:

s2e t 1

2M 1 XtM

ttMe2t; a (37)

with 2M 1 designating the window length. An initial estimate of the projection coefficient vector s may bethen obtained by fitting the obtained variance s2e t to a selected functional subspace Fs2e . This leads to theoverdetermined set of equations:

s2e t Xpsj1

sj Gbsjt gTt s (38)

where

gt9Gbs1t Gbs2t . . . GbspstTps1.

This may be solved for the coefficients of projection sj in a linear least squares sense.

The obtained initial estimate may be subsequently refined via maximum likelihood (ML) estimation.

Accordingly, the refined estimator maximizes the log-likelihood of the residual variance projection vector s

given the residual series et; a, that is the joint probability density function of the residual series et; a (nowtreated as available measurements), with respect to the residual variance projection vector s (see Section 3.2;also [72,73]):

s arg maxs

12

XNt1

lngTt s e2t; a

gTt s

( )(39)

subject to the constraint gTt s40. Estimation ofs based upon this procedure constitutes a constrained non-linear optimization problem, and is tackled via iterative optimization techniques that employ the previously

obtained initial estimate as the starting point.

Improved FS-TAR estimation may be achieved via the ML method, which maximizes the log-likelihood of

the unknown vectors !, s given the signal measurements xN (see [72,73]).

In the autoregressive moving average (TARMA) case the residual et; !

depends non-linearly upon the MA

projection coefficient vector c (see Eq. (33)), which implies that computation of the prediction error (PE)

estimator of Eq. (34) constitutes a non-quadratic problem that has to be tackled via non-linear optimization

techniques. The nature of this criterion necessitates the use of quite accurate initial parameter values.

These may be based upon linear multi stage methods or recursive methods, with prediction error based

estimation viewed as a subsequent (potential) refinement (PE-based refinement). Like in the FS-TAR case,

ML estimation may be also considered (see [72,73]).

Two linear multi stage methods, the two stage least squares (2SLS) method [65,34,8,10] and the polynomial-

algebraic (P-A) method [6769,75], as well as a recursive method, the recursive extended least squares (RELS)

method [44,63,64], are discussed next.

The linear multi stage methods aim at overcoming the difficulties associated with non-linear optimization by

approximating the original prediction error problem by a sequence of subproblems that may be tackled via

exclusively linear techniques. They utilize an infinite order TAR representation (that is a TAR1

ARTICLE IN PRESS



18/54

representation, also referred to as the inverse function representation) of the original TARMA representation

of Eq. (10a) which may be obtained by pre-multiplying the latter by C1B; t (see [76] on the existence of theinverse of a time-varying polynomial operator):

C1B; t AB; t xt C1B; t CB; t et () IB; t xt et

() xt X1r1

irt xt r et 40a

with

IB; t9C1B; t AB; t 1 X1r1

irt Br (40b)

and IB; t designating the inverse function polynomial operator.As shown in [75], the functional basis, Firt, of each irt is, in the polynomial functional subspaces case,

related to the AR and MA functional bases FAR and FMA, respectively, through the expression:

Firt F

AR [F

MA; r

1;

fG0t; . . . ; Gr1bcbtg; rX2;8


19/54

obtained et; i:AB; t; a xt CB; t; c 1 et; i et;!

() xt X

na

i

1 X

pa

j

1

ai; j Gbajt xt i X

nc

i

1 X

pc

j

1

ci; j Gbcjt et i; i et; !

() xt /Tt ! et; ! 44with /t designating the regression vector:

/t9Gba1t xt 1 . . . Gbapat xt na ...

..

.Gbc1t et 1; i . . . Gbcpct et nc; iTnapancpc1.

As the residual et; ! depends linearly upon the AR/MA coefficient of projection vector !, estimation of thelatter may be achieved via minimization of the residual sum of squares by using an ordinary least squares

(OLS) estimator.

Step 3: Innovations variance estimation. The innovations (residual) sequence may be estimated based uponthe estimate ! and the TARMA model expression (Eq. (33)). Its variance may be estimated via the procedure

described in the pure TAR case.

Remarks. The functional subspaces of the inverse function parameters (used in Step 1) may, in certain

cases (for instance in cases of high MA subspace dimensionalities), become of quite high dimensionalities

(see Eq. (41)). This may lead to statistically unreliable estimates if the ratio of the number of available

signal samples over the number of projection coefficients to be estimated is smaller than, say, 20

N=Pnir1 piro20. In such a case it is preferable to use truncated functional subspaces for the inverse functionparameters.

The Polynomial-algebraic (P-A) method [6769,75]: The method consists of five steps which are briefly

described in the sequel.Step 1: Inverse function estimation. The estimation of the truncated order inverse function operator IB; t; i

(model of Eq. (42)) is accomplished as in Step 1 of the 2SLS method.

Step 2: Initial estimation of the AR/MA coefficients of projection. Initial estimates of the AR and MA

coefficients of projection are obtained based upon the definition of the inverse function operator IB; t(Eq. (40b)). Indeed, replacing IB; t by its estimated counterpart IB; t; i gives

IB; t; i C1B; t; c AB; t; a () AB; t; a CB; t; c IB; t; i. (45)Initial estimates of the AR and MA coefficients of projection may be then obtained by deconvolving this

expression (details in [67,68]).

Step 3: Signal filtering. The inverse function representation of Eq. (40a) may be equivalently re-written as

follows (pre-multiplication by AB; t A1

B; t):AB; t A1B; t C1B; t AB; t xt|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl fflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl fflfflfflfflfflfflfflffl}

xt

et ) AB; t xt et (46)

in terms of the filtered signal xt. Once initial estimates of the AR and MA polynomial operators are available(Step 2), the filtered signal xt may be obtained via the following successive filtering operations:

CB; t; c qt AB; t; a xt,

AB; t; a xt qt. (47)Step 4: Final AR/MA projection coefficient estimation. The filtered signal x

t

should theoretically obey the

AR representation of Eq. (46). A corresponding model, parameterized in terms of the AR coefficient of

ARTICLE IN PRESS



20/54

projection vector a, is

AB; t; a xt et; a () xt Xnai1

Xpaj1

ai; j Gbajt xt i et; a

)x

t

/

T

A

t

a

e

t; a

48

with/At9 Gba1t xt 1 . . . Gbapat xt naTnapa1.

Since the residual (prediction error) et; a is a linear function ofa, estimation of the latter via minimization ofthe residual sum of squares may be achieved via an ordinary least squares (OLS) estimator.

The final MA coefficient of projection estimates are subsequently obtained via deconvolution, based upon

Eq. (45) and the final AR coefficient of projection estimates (details in [67,68]).

Step 5: Innovations variance estimation. The innovations (residual) variance coefficient of projection vector smay be obtained based upon the estimated (using Eq. (33) and the final AR/MA coefficient of projection

estimates) FS-TARMA residual series et; ! and the procedure described in the pure TAR case.

Remarks. In this method Steps 3 and 4 may be iterated until a minimum value in the RSS criterion ofEq. (34) is achieved. The method is computationally less intensive than 2SLS, and may (though not

necessarily) also provide somewhat improved accuracy. As with 2SLS, in certain cases the functional

subspaces of the inverse function parameters (used in Step 1) may become of quite high dimensionalities.

Truncation of the inverse function parameter functional subspaces may be then preferable. Yet, it should be

kept in mind, that the effects of such a truncation may be somewhat more pronounced in the present case due

to the fact that the inverse function is used again in Step 4 for the estimation of the final MA coefficients

of projection. An additional factor that may cause difficulty is instabilities in the filtering operations of

Eqs. (47).

The recursive extended least squares (RELS) method [44,63,64]: The estimation of an FS-TARMA model

may be also based upon recursive schemes, similar in nature to the previously discussed recursive maximum

likelihood (RML) method. The recursive extended least squares (RELS) method [44, pp. 136137] is a usefulsuch method and is summarized in the following:

Estimator update:

!t !t 1 kt xt /Tt !t 1. (49a)Gain:

kt Pt 1 /t1 /Tt Pt 1 /t . (49b)

Covariance update:

Pt Pt 1 Pt 1 /t /Tt Pt 11 /Tt Pt 1 /t , (49c)

/t9Gba1t xt 1 . . . Gbapat xt na ...

Gbc1t et 1jt 1 . . . Gbcpct et ncjt ncT. 49dA posteriori error:

etjt xt /Tt !t. (49e)This estimation method is (recursively) applied to the data record, with the final AR/MA coefficient of

projection estimate being equal to !

N

. As with the RML method, for initialization it is customary to set

h0 0, P0 aI (where a stands for a large positive number and I the appropriate unity matrix), and the

ARTICLE IN PRESS



21/54

initial signal and a-posteriori error values to zero. Also the recursions on the available signal may be applied in

sequential phases (for instance a forward pass, a backward pass and a final forward pass) in order to reduce

the effects of arbitrary initial conditions.

The innovations (residual) variance coefficient of projection vector s may be estimated as in the pure TAR

case, based upon the FS-TARMA residual series corresponding to the final estimate !N. The residuals areobtained from the TARMA model expression (Eq. (33) for

!

!N), that is

AB; t; !N xt CB; t; !N et; !N

Remarks. A forgetting factor l may be introduced into the method for tracking a time-dependent projection

coefficient vector !t. The gain and covariance update expressions then become

kt Pt 1 /tl /Tt Pt /t ,

Pt 1l

Pt 1 Pt 1 /t /Tt Pt 1

l /Tt Pt /t

.

The problems related to the selection ofl mentioned in the RML method are applicable. It should be furtherstressed that the FS-TARMA model structure calls for constant projection coefficients. The relaxation of

this constraint in the present case leads to TARMA models with only partially structured parameter evolution.

3.2. Model structure selection

Model structure selection refers to the estimation of the proper model structure within a selected model

class; see Table 2 for a summary of the model structure parameters for each one of the three model classes.

In this Table na and nc designate the AR and MA orders, respectively, k the constraint equation order, and

FAR, FMA and Fs2ethe AR, MA and innovations variance functional subspaces, respectively, pa, pc, ps their

respective dimensionalities, and baj, bcj, bsj their respective functional basis indices.Model structure selection is generally based upon either trial-and-error or integer optimization schemes,according to which models corresponding to various candidate structures (within a model class) are

estimated (via the procedures of the previous subsection), and the one providing the best fitness to the non-

stationary signal is selected.

The fitness function may be the Gaussian log-likelihood function of each candidate model. The particular

model that maximizes it is the most likely to be the actual underlying model responsible for the generation of

the measured signal, in the sense that it maximizes the probability of having provided the measured signal

values, and is thus selected. A problem with this approach is that the loglikelihood may be monotonically

increasing with increasing model structure (that is models of higher orders, subspace dimensionalities,

and so on), as a result of overfitting the measured signal. For this reason criteria such as the AIC

(Akaike information criterion [77]) or the BIC (Bayesian information criterion [78]) are generally used

ARTICLE IN PRESS

Table 2

Summary of model structure definition for the various TARMA model classes

Model class Model Structure

Unstructured parameter evolution M9fna; ncgStochastic parameter evolution MSP9fna; nc; kgDeterministic parameter evolution MFS9fna; nc;FAR;FMA;Fs2e g

FAR : pa; ba1; . . . ; bapaaFMA : pc; bc1; . . . ; bcpcaFs2e

: ps; bs1; . . . ; bspsa

aFor a selected basis function family.



22/54

(also see [18, pp. 200202]. These are of the respective forms:

AIC 2 lnLMhN; s2eN j xN 2 d, (50a)

BIC lnLMhN; s2eNjxN ln N

2 d (50b)

with L designating the model likelihood, N the number of signal samples, and d the number ofindependently adjusted (estimated) model parameters. As it may be readily observed, both criteria consist of

a superposition of the negative log-likelihood function and a term that penalizes the model size (structural

complexity) and thus discourages model overfitting. Accordingly, the model (and thus the model structure)

that minimizes the AIC or the BIC is selected. The AIC and BIC criteria may not be formally used in

connection with methods that directly and recursively estimate the AR/MA parameters at each time instant

(that is the RML-TARMA, SP-TARMA and SP-TARMA (smoothed) methods). In these cases the

loglikelihood function may be used with proper caution.

The Gaussian log-likelihood function of the model structure MhN; s2eN given the signal samples xN is:

lnLMh

N

; s2

eN

jxN

ln fxN

jMhN

; s2

eN

lnYN

t1 fetjhN

; s2

eN

XNt1

ln 2ps2e t1=2 expe2t2s2et

& '

) lnLMhN; s2eNjxN N

2 ln 2p 1

2XNt1

lns2e t e2ts2e t

51

with f designating the Gaussian probability density function. The dependence of the residual et on theparameters hN and also on s2eN has been dropped in the last two expressions for the sake of simplicity.Finally note that in the FS-TARMA case the loglikelihood will be a function of the projection coefficient

vectors !, s (Eq. (32a)), while the innovations variance s2

e t should be set equal to gT

t s (see Eq. (38)).

3.2.1. Search schemes for locating the best fitness model

Two schemes, an integer optimization scheme and a suboptimal search scheme, are discussed for the most

general case of FS-TARMA models (which are characterized by the maximum number of structural

parameters; see Table 2). Irrespectively of the particular scheme used, the basis function family (such as a

given polynomial family, or a trigonometric family and so on) is assumed to be pre-selected, and the

structural parameters to be estimated are those indicated in Table 2. This pre-selection may be based upon

prior knowledge or physical understanding. In any case, it should be noted that the basis function family

selection is more related to parsimony rather than accuracy. This is due to the fact that any family may

approximate any given curve with arbitrary accuracy, as long as a sufficient number of basis functions is used

[74, p. 77]. Thus, the real issue is the selection of a family that may provide the necessary accuracy with a small(or minimal) number of functions.

An integer optimization scheme [75,34]: This is a hybrid optimization scheme consisting of two distinct

phases.

Phase I. Coarse (global) Optimization: Phase I aims at determining promising subregions of the complete

search space within which optimal model structures (either in the local or global sense) might be located.

This is achieved via a genetic algorithm [79] which maximizes the negative AIC/BIC (fitness function). The

algorithm incorporates: (i) a nonlinear ranking operator, (ii) a stochastic universal sampling operator, (iii) a

two-point crossover operator, (iv) a mutation operator, and (v) a fitness-based reinsertion operator.

Phase II. Fine (local) Optimization: Phase II aims at refining the results of phase I and selecting the

globally optimum structure. It operates in a (suitably defined) neighbourhood of each initial solution (as

provided by phase I), and is based upon the backward regression concept. It thus starts with maximum values

ARTICLE IN PRESS



23/54

of the arguments (within the selected neighbourhood) and subsequently reduces either one of the model orders

na; nc, or one of the subspace dimensionalities pa;pc;ps, until no further reduction in the AIC/BIC isachieved. The procedure is repeated for all initial solutions (phase I results), and the model structure

corresponding to the globally optimum AIC/BIC is selected.

Remarks. (i) Obviously, this scheme offers the possibility of fixing certain structural parameters

in case that they happen to be a priori known. (ii) It also offers the possibility of more exhaustivesearches; for instance optimal structural parameters may be sought for each selected model orders

pair na; nc, with the overall optimal model being finally selected. (iii) An important advantage of thescheme also is that it is fully automated. (iv) Yet, this may, at the same time and depending upon the

occasion, be a disadvantage, as the search is exclusively based upon the fitness function (usually one of

the AIC/BIC criteria) and may lead to overparameterizations which may, in turn, affect the vibration

analysis accuracy.

A suboptimal search scheme [34,10]: The key characteristic of this scheme is the approximate decomposition

of the structure selection problem into two subproblems: (i) the model orders na; nc selection subproblem,and (ii) the functional subspaces pa;pc;ps; baj; bcj; bsj selection subproblem.

Phase I. Model orders selection: In order to isolate the selection of the model orders from that of the

functional subspaces, their interaction has to be minimized. This may be achieved by ensuring functional

subspaces adequacy. Toward this end extended (high dimensionality) and complete (in the sense of

including all consecutive functions up to the subspace dimensionality) functional subspaces are initially

adopted. Using them, model orders selection may be achieved via trial and error techniques based upon the

minimization of the fitness function (typically the AIC or the BIC).

Phase II. Functional subspace selection: The aim of this phase is the optimization of the extended

(redundant) functional subspaces, in the sense of increasing the representation parsimony without significantly

reducing model accuracy. This may be accomplished via trial and error techniques detecting excess basis

functions by using either the fitness function, or the aggregate parameter deviation (APD). The latter

constitutes a measure for the aggregate deviation of the parameter trajectories of the current model from those

of the initial model (phase I result):

APD9Xnai1Dai

Xnci1Dci Ds with Dqi9

PNt1 jqit qitjPN

t1 jqitj(52)

where qit designates the initial model AR/MA/innovations variance parameter trajectories and qit therespective trajectories of the currently considered model. Basis functions may be thus consecutively dropped

(one at a time) as long as no significant APD values are produced.

Remarks. (i) This scheme is suboptimal, in the sense that it may not provide the globally optimal model

structure. (ii) From a practical standpoint it is, nevertheless, effective, as it is simple to implement, of low

computational complexity, and flexible in accounting for user provided structural information. (iii) Its

main limitation is associated with the use of models with extended and complete functional subspaces in

phase I. These will typically be highly overparameterized, and the estimation of the associated high number of

coefficients of projection may pose statistical difficulties (in the sense that the number of available signal

samples may be inadequate for this purpose).

3.3. Model validation

Once a model has been obtained, it must be validated. Although this may be based on various criteria,

which may also depend upon the models intended use, formal validation procedures are typically based upon

the posterior examination of the underlying assumptions, such as the models residual (prediction error) series

uncorrelatedness (whiteness) and Gaussianity.

Due to the residual time-dependent variance, the usual residual whiteness tests may not be applicable in the

non-stationary case. Yet, a relatively simple test that may be applied is based upon the number of sign changes

in the series (residual sign test) [80, pp. 192198].

ARTICLE IN PRESS



24/54

Consider the sequence of signs of the model residual series et; ht. Let then z be the number of runs (thatis groups with common sign) and z1, z2 the total number of plus and minus, respectively, in the series. For

instance in the sign sequence:

the number of runs is z 8, the total number of plus is z1 9, and the total number of minus is z2 8.The sign test examines whether the pattern is unusual for a zero-mean uncorrelated series or not.

This is done by noting that the mean and variance of the variable z designating the number of runs may

be estimated as follows in terms of the random variables z1 and z2:

m 2 z1 z2z1 z2

1; s2 2 z1 z2 2 z1 z2 z1 z2z1 z22 z1 z2 1. (53)

Thus, the idea is to examine whether the sample number of runs z conforms, at a proper risk level a, with

its distribution. For large samples (z, z1, z2) the tails of this distribution may be approximated by those of

Gaussian curves. More specifically, the lower and upper, respectively, tail of the distribution of the test

statistics

Zl z m 1=2

s; Zu

z m 1=2s

, (54)

respectively, may be approximated by the corresponding tail of the standard normal distribution N0; 1 (thefactor 1=2 in the above is a so-called continuity correction factor that helps compensate for the fact that anactually discrete distribution is approximated by a continuous one). Zl and Zu are referred to as the lower

and upper, respectively, tail statistics.

If the sample number of runs z conforms with the above, the pattern is usual (the null H0 hypothesis

stating that the residual series is uncorrelated is valid). Otherwise the pattern is not usual (the alternative H1

hypothesis stating that the residual series is not uncorrelated is valid). This may be formally checked byexamining the proper (lower or upper) tail statistic, depending upon whether the sample number of runs z is

smaller or greater, respectively, than its estimated mean.

ARTICLE IN PRESS

0

Zl

Z

0 Z1-

Ho

acceptedH1accepted H1

acceptedHo

accepted

Zu

Lower tail test Upper tail test

fN

(Zl) f

N(Z

u)

Fig. 2. Graphical representation of the residual sign test used for model validation.



25/54

Hence the following test is formulated at the a risk level (that is probability of type I error that is rejecting

the null hypothesis H0 when it is actually correct equal to a):

For zpm : ZlXZa ) H0 is accepted (the model is valid);ZloZa ) H1 is accepted (the model is not valid):

For z4m : ZupZ1a ) H0 is accepted (the model is valid);Zu4Z1a ) H1 is accepted (the model is not valid)

with Za and Z1a designating the standard normal distributions a and 1 a critical points, respectively (Za isdefined such that ProbZpZa a) (see Fig. 2).

4. Model-based analysis

Once a TARMA representation has been obtained, model-based analysis may be performed. This includes

the computation of non-parameterized representations (such as the models impulse response function,

autocovariance function, appropriate time-frequency distributions) which are now obtained based upon theTARMA representation, as well as frozen modal quantities.

4.1. The impulse response function

As shown by Cramer [81], any non-stationary stochastic signal xt that is purely non-deterministic(deterministic components, such as trends, have been removed) possesses an one-sided (that is causal)

representation (convolution representation) of the form

xt

Xt

t1ht; t et ht; t 1 (55)

with et designating a zero-mean innovations (uncorrelated) sequence and ht; t the models time-dependentweighting or impulse response function. This function is defined as the models response to a discrete impulse

(Kronecker delta) excitation applied at time t. The convolution representation may be also written as

xt X1i0

hit et i X1i0

ht; t i et i, (56)

where, obviously, hit9ht; t i.Once a TARMA representation has been obtained, the impulse response function may be computed as [41]

h

t; t

0; tot;

ctt

t Pttj1 ajt ht j; t; tXt( (57)

with cttt designating the t tth time-dependent MA parameter (note that c0t 1, cjt 0 for jo0 orj4nc; similarly a0t 1, ajt 0 for jo0 or j4na).

Note that it is sometimes appropriate to consider the impulse response function ht; t of the (innovationsvariance) normalized TARMA representation:

AB; t xt CB; t et () AB; t xt CB; t set|fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}CB;t

et ) xt Xtt1

ht; t et (58)

with set designating the innovations standard deviation and et9et=set the innovations sequencenormalized to unit variance. This normalization allows for the systems time-dependent gain

se

t

to be

incorporated into the impulse response function. Obviously ht; t ht; t set.

ARTICLE IN PRESS



26/54

4.2. The autocovariance function

The signals autocovariance function may be obtained via the normalized models impulse response function

as follows:

gt1; t2 Efxt1 xt2g E Xt1

i1 ht1; i ei X

t2

j1 ht2;j ej ( )

) gt1; t2 Xmint1;t2

i1ht1; i ht2; i 59a

or, equivalently, by defining t19t, t29t t:

gt; t t Xmint;tt

i1ht; i ht t; i. (59b)

4.3. Time frequency distributions

The well-known, in the stationary case, notion of power spectral density is with no direct counterpart in the

non-stationary case.

A notion of frequency response may be introduced in a way that is analogous to that of time-invariant

systems, yet it lacks the important properties and physical significance of the latter. Indeed, the frequency

response function (frf) may be defined as the systems response to a complex exponential excitation e joTst

divided by the excitation [82]:

HejoTs ; t9 response of the system to ejoTst

ejoTst

with j designating the imaginary unit, o frequency in rad/time unit, and Ts the sampling period. Using the

models impulse response function ht; t

i

and the convolution relationship, this may be expressed as

HejoTs ; t 1ejoTst

X1i0

ht; t i ejoTsti X1i0

ht; t i ejoTsi. (60)

Evidently, the frequency response function is the Fourier transform of ht; t i with respect to i. Yet, it hasbeen shown [82] that it can be expressed as a rational function of the model parameters only in the (very

restrictive) case that the AR parameters are independent of time, while no analytic closedform expression for

its evaluation based upon the model parameters is available.

Note that in the above, the impulse response function ht; t of the normalized TARMA representationshould be employed in order for the systems time-dependent gain to be accounted for.

The difficulties associated with defining a notion of frequency response carry on to the definition of a power

spectral density function that is valid at each time instant (the concept of time-frequency distribution; also see

the discussion in the initial part of Section 2).

One possible approach is to employ the concept of local autocovariance gt t; t t (Eq. (3)). TheWignerVille distribution is then defined as the Fourier transform of the local autocovariance with respect to t

(t considered fixed) [1, pp. 143, 159, 5, pp. 145146, 37,6, 3, p. 504], that is [compare with the continuous-time

definition of Eq. (5)]:

SWVo; t X1t1

gt t; t t ejoTst (61)

This bears a superficial resemblance to the classical definition of the power spectral density for a stationary

signal, yet it may produce negative values.

Alternative power spectral density functions may be derived by using other analogies to the stationary case.

The Melard Tjstheim power spectral density [83,84,41], [1, pp. 160161] (also referred to as the evolutive

ARTICLE IN PRESS



27/54

power spectral density) is defined as

SMTo; t Xtt1

ht; t ejoTst

2

. (62)

Taking the magnitude squared of the normalized models frequency response function H

ejoTs ; t

parametric time domain modelling

Documents