statistics of random processes || introduction

Introduction

A considerable number of problems in the statistics of random processes are formulated within the following scheme.

On a certain probability space (D,:F,P) a partially observable random process (O,e) = (Ot,et), t ~ 0, is given with only the second component e = (et), t ~ 0, observed. At any time t it is required, based on eS = {ea, ° ::; s ::; t}, to estimate the unobservable state Ot. This problem of estimating (in other words, the filtering problem) Ot from eS will be discussed in this book.

It is well known that if the mean M(Ol) < 00, then the optimal mean square estimate of Ot from eS is the a posteriori mean mt = M(Otl.r;), where .r; = a{w : ea, s ::; t} is the a-algebra generated byes. Therefore, the solution of the problem of optimal (in the mean square sense) filtering is reduced to finding the conditional (mathematical) expectation mt = M(Otl.r;).

In principle, the conditional expectation M(Otl.r;) can be computed by the Bayes formula. However, even in many rather simple cases, equations obtained by the Bayes formula are too cumbersome, and present difficulties in their practical application as well as in the investigation of the structure and properties of the solution.

From a computational point of view it is desirable that the formulae defining the filter mt, t 2: 0, should be of a recurrent nature. Roughly speaking, this means that mH.A, Ll > 0, must be built up from mt and observations {;+.A = {{a: t ::; s ::; t + Ll}. In the discrete case t = 0,1,2, ... , the simplest form of such a recurrence relation may be, for example, the equation

(1)

where Llffit = mt+l - mt. In the case of continuous time, t 2: 0, stochastic differential equations

(2)

have such a form. It is evident that without special assumptions concerning the structure

of the process (I}, {) it is difficult to expect that optimal values mt should satisfy recurrence relations of the types given by (1) and (2). Therefore,

R. S. Liptser et al., Statistics of Random Processes© Springer-Verlag Berlin Heidelberg 2001

2 Introduction

before describing the structure of the process (0, {) whose filtering problems are investigated in this book, we shall study a few specific examples.

Let 0 be a Gaussian random variable with MO = m, DO = 'Y, which for short will be written 0", N(m, 'Y). Assume that the sequence

{t = 0 + et, t = 1,2, ... , (3)

is observed, where el, e2, ... is a sequence of mutually independent Gaussian random variables with zero mean and unit variance independent also of e. Using a theorem on normal correlation (Theorem 13.1) it is easily shown that mt = M(OI{l!'" ,{t) and the tmcking errors 'Yt = M(O - mt)2 are found by

'Y 'Yt = 1 + 'Yt'

From this we obtain the following recurrence equations for mt and 'Yt:

L1mt = -1 'Yt [{HI - mtl, +"Yt

where L1mt = mt+1 - mt, L1'Yt = 'Yt+1 - 'Yt·

(4)

(5)

(6)

Let us make this example more complicated. Let 0 and {1,{2,'" be the same as in the previous example, and let the observable process {t, t = 1,2, ... , be defined by the relations

(7)

where functions Ao(t,{) and A1(t,e) are assumed to be .r;-measurable (Le., Ao(t,e) and A1(t,e) at any time depend only on the values (eo, ... ,et», .r; = u{w : eo, ... , et}.

The necessity to consider the coefficients Ao(t,e) and A1(t,e) for all 'past history' values ({o, ... ,el) arises, for example, in control problems (Section 14.3), where these coefficients play the role of 'controlling' actions, and also in problems of information theory (Section 16.4), where the pair of functions (Ao(t,{),A1(t,{», is treated as 'coding' using noiseless feedback.

It turns out that for the scheme given by (7) the optimal value mt =

M(Otl.r;) and the conditional variance 'Yt = M[(O - mt)21.r;] also satisfy recurrence equations (see Section 13.5):

'YtAI (t, {) Llmt = 1 + ANt,eht (et+1 - Ao(t,e) - A1(t,e)mt), mo = mj (8)

A~(t,ehl 1 + A~(t, eht ' 'Yo = 'Y. (9)

Introduction 3

In the schemes given by (3) and (7) the question, in essence, is a traditional problem of mathematical statistics-Bayes estimation of a random parameter from the observations eA. The next step to make the scheme given by (7) more complicated is to consider a random process Ot rather than a random variable O.

Assume that the random process (0, e) = (Ot, et), t = 0,1, ... , is described by the recurrence equations

Ot+! = ao(t, e) + a1 (t, eWt + b(t, e)c1 (t + 1),

et+! = Ao(t,e) + A1(t,eWt + B(t,e)c2(t + 1) (10)

where C1 (t), c2(t), t = 1,2, ... , the sequence of independent variables, is normally distributed, N(O, 1), and also independent of (00 , eo). The coefficients ao(t, e), . .. ,B(t, e) are assumed to be .r;-measurable for any t = 0, 1, ....

In order to obtain recurrence equations for estimating mt = M(Otl.r;) and conditional variance 'Yt = M {[Ot - mt]2IF!}, let us assume that the conditional distribution P(Oo ::; xleo) is (for almost all eo) normal, N(m, 'Y)' The essence of this assumption is that it permits us to prove (see Chapter 13) that the sequence (0, e) satisfying (10) is conditionally Gaussian. This means, in particular, that the conditional distribution P(Ot ::; xl.r;) is (almost surely) Gaussian. But such a distribution is characterized only by its two conditional moments mt and 'Yt, leading to the following closed system of equations:

a1 Al'Yt ao + a1 mt + B2 A2 [et+1 - Ao - A1mt], + l'Yt

mo=mj

'Yt+! = [ 2 + b2] _ (a1 Al'Yt)2 'Yo = 'Y a1'Yt B2 + Aht' (11)

(in the coefficients ao, ... ,B, for the sake of simplicity, arguments t and e are omitted).

The equations in (11) are deduced (in a somewhat more general framework) in Chapter 13. Their deduction does not need anything except the theorem of normal correlation. In this chapter, equations for optimal estimation in extrapolation problems (estimating Or from eA, when T > t) and interpolation problems (estimating Or from eA when T < t) are derived. Chapter 14 deals with applications of these equations to various statistical problems of random sequences, to control problems, and to problems of constructing pseudo-solutions to linear algebraic systems.

These two chapters can be read independently of the rest of the book, and this is where the reader should start if he/she is interested in nonlinear filtering problems but is not sufficiently well acquainted with the general theory of random processes.

The main part of the book concerns problems of optimal filtering and control (and also related problems of interpolation, extrapolation, sequential estimation, testing of hypotheses, etc.) in the case of continuous time. These

4 Introduction

problems are interesting per se; and, in addition, easy formulations and compact formulae can be obtained for them. It should be added here that often it is easier, at first, to study the continuous analog of problems formulated for discrete time, and use the results obtained in the solution of the latter.

The simplicity of formulation in the case of continuous time is, however, not easy to achieve-rather complicated techniques of the theory of random processes have to be invoked. Later, we shall discuss the methods and the techniques used in this book in more detail, but here we consider particular cases of the filtering problem for the sake of illustration.

Assume that the partially observable random process (B, e) = (Bt, et), t ~ 0, is Gaussian, governed by stochastic differential equations (compare with the system (10)):

d(h = a(t)(ltdt + b(t)dwl(t), det = A(t)Btdt + B(t)dw2(t), Bo == 0(12)

where Wl(t) and W2(t) are standard Wiener processes, mutually independent and independent of (Bo, eo), and B(t) ~ C > O. Let us consider the component B = (lit), t ~ 0, as unobservable. The filtering problem is that of optimal estimation of Bt from e~ in the mean square sense for any t ~ O.

The process (B, e), according to our assumption, is Gaussian: hence the optimal estimate mt = M(Btl.rt~) depends linearly on e~ = {es : s ~ t}. More precisely, there exists (Lemma 10.1) a function G(t, s) with J~ G2(t, s)ds < 00, t > 0, such that (almost surely):

mt = lot G(t, s)des. (13)

If this expression is formally differentiated, we obtain

( t aG(t,s) ) dmt = G(t, t)det + 10 at des dt. (14)

The right-hand side of this equation can be transformed using the fact that the function G(t, s) satisfies the Wiener-Hopf equation (see (10.25)), which in our case reduces to

aG(t, s) [ A2(t)] at = a(t) - 'Yt B2(t) G(t, s), t> s, (15)

'YsA(s) G(s, s) = B2(s)' 'Ys = M[Bs - ms]2. (16)

Taking into account (15) and (14), we infer that the optimal estimate mt, t > 0, satisfies a linear stochastic differential equation,

(17)

Introduction 5

This equation includes the tracking error 'Y = M[Ot - mtF, which in turn is the solution of the Ricatti equation

. A2(thl 2 'Yt = 2a(th(t) - B2(t) + b (t). (18)

(Equation (18) is easy to obtain applying the Ito formula for substitution of variables to the square of the process [Ot - mtl with posterior averaging.)

Let us discuss Equation (17) in more detail taking, for simplicity, eo == o. Denote - _ t des - A(s)msds (19)

W - Jo B(s) .

Then Equation (17) can be rewritten:

'YtA(t) dmt = a(t)mtdt + B(t) awt· (20)

The process (Wt), t :::: 0, is rather remarkable and plays a key role in filtering problems. The point is that, first, this process turns out to be a Wiener process (with respect to the O"-algebras (:F;), t :::: 0), and second, it contains the same information as the process e. More precisely, it means that for all t :::: 0, the O"-algebras ff = o-{w : ws, sst} and .r; = O"{w : es, sst} coincide:

(21)

(see Theorem 7.16). By virtue ofthese properties of the process W it is referred to as the innovation process.

The equivalence of O"-algebras :F; and :FJ: suggests that for mt not only is Equation (13) justified but also the representation

mt = lot F(t, s)aws (22)

where W = (Wt), t :::: 0 is the innovation process, and functions F(t, s) are such that J; F2(t, s)ds < 00. In the main part of the text (Theorem 7.16) it is shown that the representation given by (22) can actually be obtained from results on the structure of functionals of diffusion-type processes. Equation (20) can be deduced in a simpler way from the representation given by (22) than from the representation given by (13). It should be noted, however, that the proof of (22) is more difficult than that of (13).

In this example, the optimal (Kalman-Bucy) filter was linear because of the assumption that the process (0, e) is Gaussian. Let us take now an example where the optimal filter is nonlinear.

Let (Od, t :::: 0, be a Markov process starting at zero with two states 0 and 1 and the only transition 0 -+ 1 at a random moment 0", distributed (due to assumed Markov behavior) exponentially: P(O" > t) = e- At , >. > o. Assume that the observable process e = (et), t :::: 0, has a differential

6 Introduction

(23)

where W = (Wt), t ~ 0, is a Wiener process independent of the process 0= (Ot), t ~ 0.

We shall interpret the transition of the process 0 from the 'zero' state into the 'unit' state as the occurrence of discontinuity (at the moment er). There arises the following problem: to determine at any time t > ° from observations ~S whether or not discontinuity has occurred before this moment.

Denote 7rt = P(Ot = 11.r;) = P(er ~ tl.r;). It is evident that 7rt = mt = M(Otl.r;). Therefore, the a posteriori probability 7rt, t ~ 0, is the optimal (in the mean square sense) state estimate of an unobservable process 0 = COt), t ~ 0.

For the a posteriori probability 7rt, t ~ 0, we can deduce (using, for example, Bayes formula and results with respect to a derivative of the measure corresponding to the process ~, with respect to the Wiener measure) the following stochastic differential equation:

d7rt = ,x{1 - 7rt)dt + 7rt{1- 7rt)[d~t - 7rtdtj, 7ro = 0. (24)

It should be emphasized that whereas in the Kalman-Bucy scheme the optimal filter is linear, Equation (24) is essentially nonlinear. Equation (24) defines the optimal nonlinear filter.

As in the previous example, the (innovation) process

Wt = lot [des - 7rtdsj, t ~ 0,

turns out to be a Wiener process and Jfi = .r;, t ~ 0. Therefore, Equation (24) can be written in the following equivalent form:

(25)

It appears that all these examples are within the following general scheme adopted in this book.

Let (n, F, P) be a certain probability space with a distinguished nondecreasing set of er-algebras (Ft ), t ~ ° (Fa ~ Ft ~ F, s ~ t). In this probability space we are given a partially observable process (Ot,~t), t ~ 0, and an estimated process (ht ), t ~ 0, dependent, generally speaking, on both the unobservable process Ot, t ~ 0, and the observable component (~t), t ~ 0.

As to the observable process l ~ = (~t, F t ) it will be assumed that it permits a stochastic differential

(26)

where W = (wt,Ft ), t ~ 0, is a standard Wiener process {i.e., a square integrable martingale with continuous trajectories with M[(wt - wa)2lFsj =

1 e = (et,Ft ) suggests that values et are .rt-measurable for any t ~ O.

Introduction 7

t - s, t ~ s, and Wo = 0), and A = (At{w),.rt), t ~ 0, is a certain integrable random process2 •

The structure of the unobservable process 8 = (8t ,.rt), t ~ 0, is not directly concretized, but it is assumed that the estimated process h = (ht,.rt), t ~ 0, permits the following representation:

ht = ho + lot as{w)ds + Xt, t ~ 0, (27)

where a = (at{w), .rt), t ~ 0, is some integrable process, and x = (Xt, .rt), t ~ 0, is a square integrable martingale.

For any integrable process 9 = (gt,.rt), t ~ 0, write 1I't{g) = M[gtl.rf]. Then, if Mgl < 00, 1I't{g) is the optimal (in the mean square sense) estimate of gt from es = {es : s ~ t}.

One of the main results of this book (Theorem 8.1) states that for 1I't{h) the following representation is correct

1I't{h) = 1I'o{h) + lot 71's {a)ds + lot 71's {D)dWs + lot [71's (hA) - 71's (h) 71's (A)]dWs.

(28) Here 'iii = (Wt,.rf), t ~ 0, is a Wiener process (compare with the innova

tion processes in the two previous examples), and the process D = (Dt,.rt), t ~ 0, characterizes correlation between the Wiener process W = (Wt,.rt), t ~ 0, and the martingale x = (Xt, .rt), t ~ o. More precisely, the process

D = d(x, w}t t > 0 t dt' -, (29)

where (x, w}t is a random process involved in Doob-Meyer decomposition of the product of the martingales x and w:

M[xtwt - xswsl.rs] = M[(x,w}t - (x,w}sl.rs]. (30)

We call the representation in (28) the main equation of (optimal nonlinear) filtering. Most of the known results (within the frame of the assumptions given by (26) and (27)) can be deduced from this equation.

Let us show, for example, how the filtering Equations (17) and (18) in the Kalman-Bucy scheme are deduced from (28), taking, for simplicity, b{t) == B(t) == 1.

Comparing (12) with (26) and (27), we see that At(w) = A(t)8t , Wt = W2(t). Assume ht = 8t . Then, due to (12),

ht = ho + lot a(s)8sds + Wl{t). (31)

2 Actually, this book examines processes { of a somewhat more general kind (see Chapter 8).

8 Introduction

The processes WI = (WI(t)) and W2 = (W2(t)), t ~ 0, are independent square integrable martingales, hence for them Dt == 0 (P-a.s.). Then, due to (28), 7f't(O) has a differential

d7f't(O) = a(t)7f't(O)dt + A(t)[7f't(02) - 7f'l(O)]d'iih, (32)

i.e., (33)

where we have taken advantage of the Gaussian behavior of the process (O,~); (P-a.s.)

7f't(02) - 7f'l(O) = M[(Ot - mt)21.r;] = M[Ot - mt]2 = 'Yt.

In order to deduce an equation for 'Yt from (28), we take ht = 0;' Then, from the first equation of the system given by (12), we obtain by the Ito formula for substitution of variables (Theorem 4.4),

ol = 03 + lot as(w)ds + Xt, (34)

where

and

Xt = lot OsdWl(S).

Therefore, according to (28)

d7f't(02) = [2a(t)7f't(02) + b2(s)(t)]dt + A(t)[7f't(03) - 7f't(0)7f't(02)]dWt . (35)

From (32) and (35) it is seen that in using the main filtering equation, (28), we face the difficulty that for finding conditional lower moments a knowledge of higher moments is required. Thus, for finding equations for 7f't(02) the knowledge of the third a posteriori moment 7f't(03) = M(Orl.r;) is required. In the case considered this difficulty is easy to overcome, since, due to the Gaussian behavior of the process (O,~), the moments 7f't(on) = M(Ofl.r;) for all n ~ 3 are expressed in terms of 7f't(O) and 7f't(02). In particular,

7f't(03) - 7f't(O)7f't(02) = M[Ol(Ot - mt)I.r;] = 2mt'Yt

and, therefore,

d7f't(02) = [2a(t)7f't(02) + b2(t)]dt + 2A(t)mt'YtdWt. (36)

By the Ito formula for substitution of variables, from (33) we find that

dml = 2mt[a(t)mtdt + A(thtmtdWt] + A2(th2(t)dt.

Introduction 9

Together with Equation (36) this relation provides the required equation for 'Yt = 1Tt(tJ2) - mr

The deduction above of Equations (17) and (18) shows that in order to obtain a closed system of equations defining optimal filtering some supplementary knowledge about the ratios between higher conditional moments is needed.

This book deals mainly with the so-called conditionally Gaussian processes ((},~) for which it appears possible to obtain closed systems of equations for the optimal nonlinear filter. Therefore, a wide class of random processes (including processes described by the Kalman-Bucy scheme) is described, i.e., random processes for which it has been possible to solve effectively the problem of constructing an optimal nonlinear filter. This class of processes ((},~) is described in the following way.

Assume that the process ((},~) is a diffusion-type process with a differential

d(}t = [ao(t,~) + ai(t,~)(}tldt + bi(t,~)dwi(t) + b2(t,~)dw2(t), d~t = [Ao(t,~) + Ai (t, ~)Otldt + Bi (t, ~)dWi (t) + B2(t, ~)dW2(t), (37)

where each of the functionals ao(t, ~), ... ,B2(t,~) is .rr-measurable at any t ~ 0 (compare with the system given by (10)). We emphasize the fact that the unobservable component (}t enters into (37) in a linear way, whereas the observable process ~ can enter into the coefficients in any F;-measurable way. The Wiener processes Wi = (Wi(t)), W2 = (W2(t)), t ~ 0, and the random vector ((}o, ~o) included in (37) are assumed to be independent.

It will be proved (Theorem 11.1) that if the conditional distribution P((}o :::; xl~o) (for almost all ~o) is Gaussian, N(mo, 'Yo), where mo = M((}ol~o), 'Yo = M[((}o - m2)21~o]' then the process ((},~) governed by (37) will be conditionally Gaussian in the sense that for any t ~ 0 the conditional distributions P((}to :::; Xo,···, (}tk :::; xkIF;), 0 :::; to < h < ... < tk :::; k, are Gaussian. Hence, in particular, the distribution P((}t :::; xIF;) is also (almost surely) Gaussian, N(mt,'Yt), with parameters mt = M((}t l.rr), 'Yt = M[(Bt - mt)21.rrl·

For the conditionally Gaussian case (as well as in the Kalman-Bucy schemes), the higher moments M(Bfl.rr) are expressed in terms of mt and 'Yt. This allows us (from the main filter equation) to obtain a closed system of equations (Theorem 12.1) for mt and 'Yt:

10 Introduction

Note that, unlike (18), Equation (39) for "Yt is an equation with random coefficients dependent on observable data.

Chapters 10, 11 and 12 deal with optimal linear filtering {according to the scheme given by {12}} and optimal nonlinear filtering for conditionally Gaussian processes {according to the scheme given by {37}}. Here, besides filtering, the pertinent results for interpolation and extrapolation problems are also given.

The examples and results given in Chapters 8-12 show how extensively we use in this book such concepts of the theory of random processes as the Wiener process, stochastic differential equations, martingales, square integrable martingales, and so on. The desire to provide sufficient proofs of all given results in the nonlinear theory necessitated a detailed discussion of the theory of martingales and stochastic differential equations (Chapters 2-6). We hope that the material of these chapters may also be useful to those readers who simply wish to become familiar with results in the theory of martingales and stochastic differential equations.

At the same time we would like to emphasize the fact that without this material it does not seem possible to give a satisfactory description of the theory of optimal nonlinear filtering and related problems.

Chapter 7 discusses results, many used later on, on continuity of measures of Ito processes and diffusion-type processes.

Chapters 15-17 concern applications of filtering theory to various problems of statistics of random processes. Here, the problems of linear estimation are considered in detail (Chapter 15) and applications to certain control problems and information theory are discussed (Chapter 16).

Applications to non-Bayesian problems of statistics (maximal likelihood estimation of coefficients of linear regression, sequential estimation and sequential testing of statistical hypotheses) are given in Chapter 17.

Chapters 18 and 19 deal with point (counting) processes. A typical example of such a process is the Poisson process with constant or variable intensity. The presentation is patterned, to a large extent, along the lines of the treatment in Volume I of Ito processes and diffusion-type processes. Thus we study the structure of martingales of point processes, of related innovation processes, and the structure of the Radon-Nikodym derivatives. Applications to problems of filtering and estimation of unknown parameters from the observations of point processes are included.

Notes at the end of each chapter contain historical and related background material as well as references to the results discussed in that chapter.

In conclusion the authors wish to thank their colleagues and friends for assistance and recommendations, especially A.V. Balakrishnan, R.Z. Khasminskii and M.P. Yershov. They made some essential suggestions that we took into consideration.

statistics of random processes || introduction

Documents