paul gerhard hoel - introduction to stochastic processes (the houghton mifflin series in statistics)...

Hoel

Port

Stone

Iintroduction

to Stochastic

Processes

,

Th.� Houghton MifHin Series in Statistics undler the Editorship of Herlman Chernoff

LEO BREIMAN

Probability and Stochastic J.Processes: With a View Toward Applications Statistics: With a View Toward Applications

PAUL G. HOEL, SIDNEY C. PORT, AND CHARLES J. STONE

Introduction to Probability Theory Introduction to StatisticalT'heory Introduction to Stochastic }·rocesses

PAUL F. LAZARSFELD AND N1EIL W. HENRY

Latent Structure Analysis

G01ITFRIED E. NOETHER

Introduction to Statistics-.A Fresh Approach

Y. S:. CHOW, HERBERT ROBBn�s, AND DAVID SmGMUNI)

Great Expectations: The Theory of Optimal Stopping

I. RICHARD SA V AGE

Statistics: Uncertainty and .Behavior

I ntrod uction to Stochastic Processes

Paul G. Hoel Sidney C. Port Charles J. Stone University of California, Los Angeles

HOUGHTON MIFFLIIN COMPANY BOSTON

New York Atlanta Geneva, Illinois Daillas Palo Alto

COPYRIGHT © 1 972 BY HOUGHTON MIFFLIN COMPANY.

All rights reserved. No Jpart of this work may bt! reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, wlthout Jpermission in writing from the publisher.

PRINTED IN THE U.S.A.

LmRARY OF CONGRESS CATALOG CARD NUMBER: 79-165035

ISBN: 0-395-12076-4

General Preface

This three-volume series gre'N out of a three-quarter course in proba.bility, statistics, and stochastic process(!s taught for a number of years at UCLA. VVe felt a need for a series of books that would treat these subjects in a way that is well coordinate:d, but which would also give adequate emphasis to each subject as being interesting and useful on its own merits.

The first volume, Introduction to Probability Theory, presents the fundarnental ideas of probability theory and also prepares the student both for courses in statistics and for further study in probability theory, including stochastic pro(;esses.

The second volume, Introduction to Statistical J'heory, �evelops the basic theory of mathematical statistics in a systematic, unified manner. Togethe:r, the first two volumes contain the material that is often covered in a two-semester course in mathemlatical statistics.

The third volume, Introduction to Stochastic Processes, treats Markov chains, Poisson processes, birth and death processes, Gaussian \I processes, Bro'wnian motion, and processes defined in terms of Brownian motion by means of elementary stochastic differential equations.

v

Preface

In recent years there has been an ever increasing interest in the study of systems which vary in time in a random Inanner. Mathematical models of such systelms are known as stochastic processes. In this book we present an elementary account of some of the important topics in the theory of such processes. We have tried to select topics that are conceptually interesting and that have found fruitful application in various branches of science and technology.

A stochastic process can be de:fined quite generally as any collection of random variables Jr(t), t E T, defined on a common probability space, where T is a subset of ( - 00, co) and is thought of as the time parameter set. The process is called a continuous parameter process if I'is an interval having positive length and a dlscrete parameter process if T is a subset of the integers. If the random variables X{t) all take on values from the fixed set f/, then f/ is called the state space of the process .

Many s1tochastic processes of theoretical and appli1ed interest possess thle property that, given the present state of the process, the past history does not affect conditiona.l probabilities of events defined in terms of the future. Such processes are called .Markov processes. In Chapters 1 and 2 we study Markov chains, which are discrete parameter Markov processes whose state space is finite or countably infinite. In Chapter 3 we study the corresponding continuous parameter processes, with the "]Poisson process" as a special case.

In Chapters 4-6 we discuss continuous parameter processes whose state space is typically the real line. In Chapter 4 we introduce Gaussian processes, which are characterized by the property that every linear comlbination involving a finite number of the random variables X(t), t E T, is normally distributed. j\s an important special case, we discuss the Wiener process, which arises as a nlathematical model for the physical phenomenon known as "Brownian motion."

In Chapter 5 we discuss integration and differentiation of stochastic processes. There we also use the Wiener process to give a mathematical model for Hwhite noise."

In Chapter 6 we discuss solutions to nonhomogeneous ordinary differential equations having constant coefficients whose right-hand side is either a stochastic process or white noise. We also discuss estimation problems involving stochastic processes, and briefly consider the "spectral distribution" of a process.

vii

viii P'refsce

This text has been designed for a one-semester course i!l stochastic prolcesses. Written in close conjunction vvith Introduction to l'robability Theory, the first volume of our three-volume series, it assumes that th1e student is acquainted with the material covered in a one-slemester course in probability for which elem1entary calculus is a prerequisite.

Some of the proofs in Chapt,ers 1 and 2 are some'Nhat more difficult than the rest of the text, and they appear in appendices to these: chapters. These proofs and the starred material in Section 2.6 probably should bc� omitted or discussed only briefly in an elementary course ..

An instructor using this text in a one-quarter course will probably not have time to cover the entire text. He may wish to cover the first three chapters thoroughly and the relmainder as time permits, perhaps discussing those topics in the last three chapters that involve the Wiener process. Another option, however, is to eJmphasize continuous parameter proc1esses by omitting or skimming Chapters 1 and 2 and conce�ntrating on Chapters 3-6. (For example�, the instructor could skip Sections 1 .6. 1 , 1 .6.2, 1 .9, 2.2.2, 2. 5 . 1 , 2.6. 1 , and 2.8 .. ) With some aid from the instructor, the student should be able to read Chaptc�r 3 without having studied the first tvvo chapters thoroughly. Chapters 4-6 are independent of the first two chapters and depend on C�apter 3 only in minor ways, mainly in that the Poisson process introduced in Chapter .3 is used in examples in the later chapters.. The properties of the Poisson plocess that are needed later are summarized in Chapter 4 and can be regarded as axioms; for the Poisson proc{�ss.

The authors wish to thank the UCLA students who tolerated prelinlinary versions of this text and whose: comments resulted in numerous improvernents. Mr. Luis �Gorostiza obtained the answers to the exe:rcises and also made many suggestions that resulted in significant improvements. Finally, we wish to thank Mrs. Ruth Goldstein for her excellent typing.

T able of Contents

1 Mlarkov Chains 1

1.1 Markov chains having two states 2 1.2� Transition function and initial distribution 5 1.3 Examples 6 1.4� Computations writh transition functions 1 2

1.4.1 Hitting times 1 4 1.4.2 Transition matrix 1 6

1.5; Transient and re:current states 1 7 1.6; Decomposition of the state space 21

1.6.1 Absorption probabilities 25 1.6.2 Martingales 27

1.7' Birth and death chains 29 1.8: Branching and queuing chains 33

1.8.1 Branching chain 34 1.8.2 Queuing chain 36 Appendix

1.91 Proof of results for the branching and queuing chains 36 1.9.1 Branching chain 38 1.9.2 Queuing chain 39

2 St.ltionary Distribut:ions of a Markov Chain 47

2.1 Elementary properties of stationary distributions 47 2.2, Examples 49

2.2.1 Birth and death chain 50 2.2.2 Particles in a box 53

2.3 Average number of visits to a recurrent state 56 2.4 Null recurrent and positive recurrent states 60 2.5; Existence and uniqueness of stationary distributions 63

2.5.1 Reducible chains 67 2.6; Queuing chain 69

2.6.1 Proof 70

ix

x Table of COjntents

2.'7 Convergence to the stationary distribution 72 Appendix

2.:8 Proof of convergence 75 2.S.1 Periodic case 77 2.8.2 A result from number theory 79

3 Nlarkov Pure Jum�. Processes 84

3.1 Construction of jump processes 84 3.2 Birth and death processes 89

3.2.1 Two-state birth and death process 92 3.2.2 Poisson process 94 3.2.3 Pure birth process 98 3.2.4 Infinite� server queue 99

3.:3 Properties of a Markov pure jump process 102 3.3.1 Applications to birth and death processes 104

4 S.�cond Order Pro�c:esses 1 1 1

4.1 Mean and covariance functions 1 1 1 4.:2 Gaussian proce:sses 1 19 4.:3 The Wiener process 122

5 C,ontinuity, Integr�ltion, and Differelntiation of Secon,d O�rder Processes 128 5.1 Continuity assumptions 128

5.1.1 Continuity of the mean and covariance functions 128 5.1.2 Continuity of the sample functions 1 30

5.:2 Integration 1 32 5.:3 Differentiation 1 35 5.4 White noise 141

6 St:ochastic Differelltial Equations, I:stimation Theor", alld Spectral Distriibutions 1 52 6.1 First order diff(�rential equations 1 54 6.:Z Differential equations of order n 1 59

6.2.1 The case n = 2 1 66 6.3

6.4

Estimation theory 6.3.1 General principles of estimation 6.3.2 Some e:xamples of optimal prediction Spectral distribution

Answers to Exercises Glossary of Notation Index

1 70 173 174 1 77

190 199 201

1 Markov Chains

(�onsider a system that can be in any one of a finite or countably infinite number

of states. Let fI' denote this set of states. We can assume that fI' is a subset of the

int�egers. The set !/ is called the state space of the system,. Let the system be

observed at the discrete moments of time n = 0, 1, 2, . . . , and let Xn denote the state

of the system at tinle n.

Since we are interested in non-deterministic systems, we think of Xm n > 0, as

random variables defined on a common probability space. Little can be said about

such random variables unless SOlne additional structure is imposed upon them.

1rhe simplest possible structure is that of independent random variables. This

would be a good model for such systems as repeated experimLents in which future

states of the system are independent of past and present states. In most systems

that arise in practice, however, past and present states of the system influence the

future states even if they do not uniquely determine them.

�vfany systems have the property that given the present state, the past states have

no influence on the future. This property is called the Markov property, and

systems having this property ar(;� called Markov chains. The Markov property is

defined precisely by the requirenlent that

for every choice of the nonnegative integer 11 and the numbers Xo, . . . , xn+ 1, each

in �Cf. The conditional probabilities P(Xn+ 1 = Y I Xn = x) arc� called the transition probabilities of the chain. In this book we will study Markov chains having

stationary transition probabilities, i .e . , those such that P(Xn + 1 = Y I Xn = x) is

independent of n. From now 011, when we say that Xm n > 0, forms a Markov

chain, we mean that these rando]tn variables satisfy the Markov property and have

sta1tionary transition probabilities .

lrhe study of such Markov chains is worthwhile from two viewpoints. First,

they have a rich theory, much of which can be presented at an elementary level .

Secondly, there are a large nurnber of systems arising in practice that can be

modeled by Markov chains, so the subject has many useful applications.

1

2 Markov C�hains

In order to help motivate the general results that will be discussed later, we: begin by conside:ring Markov chains having only two states.

1,,1. Markov cha i n!; havi ng two state!;

For an example of a �f{arkov chain having t,�o states, consider a machine that at the start of any particular day is either broken down or in ope:rating condition. Assume that if the machine is broken down at the start of the nth day, the probability is p that it will be successfully repaired and in operating condition at the start of the (n + l )th day. Assume also that if the machine is in operating condition at the start of the nth day, the probability is q that it will have a failure causing it to be broken down at the start of the (n + l)th day. Finally, let 1to(O) denote the probability that the machine is broken down initially, i .e .. , at the start of the Oth day.

Let the state 0 correspond to the machine being broken down a.nd let the state 1 correspond to the machine being in operating condition. Let X;" be the random variable denoting the state of the machine at tilme n.

According to the above description

and

})(Xn+ 1 = 1 I Xn = 0) = p,

})(Xn+ 1 = 0 I Xn = 1) = q,

P(Xo = 0) = 1tO(O).

Since there are only two states, 0 and 1 , it follows immediately that

P(�(n+ 1 = 0 I Xn = 0) =: 1 - p,

P(�(n+ 1 = 1 I Xn = 1) =: 1 - q,

and that the probability 1to( l ) of being initially in state 1 is given by

1to(l) = P(Xo = 1) = 1 - 1to(O).

From this information, we can easily compute P(Xn = 0) and P(Xn = 1) . ",,"'e observe that

}'(Xn+ 1 = 0) = P(Xn := 0 and Xn+ 1 = 0) -1- P(Xn = 1 and Xn+ 1 = 0)

= P(Xn := O)P(Xn+'l = 0 I Xn = 0)

+ P(�J(n = I)P(Xn+ 1 = 0 I Xn = 1)

= (1 - lJ)P(Xn = 0) + qP(XII1 = 1)

= ( 1 - p)P(Xn = 0) + q(1 - P(Xn = 0»

= (1 - lJ - q)P(Xn = 0) + q.

,. ,. Mark:o v chains having two .states

Now P(Xo = 0) = no(O), so

P(XI := 0) = (1 - p - q)1to(0) + q and

P(X2 = 0) = (1 - p - q )P(XI = 0) + q

= (1 - p - q)2no(0) + q[1 + (1 - p - q)].

It is easily seen by repea.ting this procedure n times that n-l

(2) P(Xn = 0) = (1 - p - q)nno(O) + q � (1 - p - q)i.

i= O

In the trivial case p = q = 0, it is clear that for all n

P(Xn = 0) = no(O) and P()(n = 1) = no(I).

3

Suppose now that p + q > O. Then by the formula for the suml of a finite geometric progression,

Rf (1 _

p _ q)i = 1 - (1 - p - q)R •

i= O p + q

W�e conclude from (2) that

(3) P(Xn = 0) = _L + (1 - p - q)n (no(O) - q ) , p + q p + q

and consequently that

(4) P(Xn = 1) = _L + (1 - p - q)n (no(l) - p ) .

p + q p + q

Suppose that p and q are neither both equal to zero nor both equal to 1. Then 0 < p + q < 2, "'hich implies that 11 - - p - ql < 1. In this case we: can let n --+ 00 in (3) and (4) and conclude that

(5) lim P(Xn = 0) == q

n-'oo p + q and lim P(Xn = 1) = _1_' .

n-.oo p + q

W�e can also 0 btain th�e pro ba bili ties q / (p -+ q) and p / (p + q) by a different approach. Suppose we want to choose no(O) and no(l) so that P(.Xn = 0) and P(Xn = 1) are independent of n. It is clear from (3) and (4) that to do this we should choose

no(O) = --qp + q

and p no(l) = . p + q

Thus we see �hat if Xn, n > 0, starts out with the initial distribution

P(Xo = 0) = --qp + q

and p P(x"o = 1) = --p + q

4

then for all n

P(Xn = 0) = -

q p + q

and

Marko v (:hains

P(}(n = 1) = P p + q

The description of the machine is vague b(�cause it does not really say whether Xm n > 0, can be assumed to satisfy the Markov property. Let us suppose, however, that the Markov prop(�rty does hold. We can use this added information to compute the joint distribution of Xo , Xl , . .. , Xn•

For example, let n = 2 and let xo, Xl ' and X2 each equal 0 or 1 . Then

P(Xo = Xo, Xl = Xl' and X2 = X2)

= P(Xo = Xo and Xl = XI)P(X2 = X2 I Xo = Xo and Xl = Xl)

= P(Xo = XO)P(XI == Xl I Xo = XO)P(X2 == x21 Xo = Xo and Xl == Xl)·

Now P(Xo = xo) and P(XI = Xl I Xo = xo) are determined by p, q, and 1to(O) ; but without the Markov property, we cannot evaluate P(X2 = X2 I Xo = Xo and Xl = Xl) in terms of p, q, and 1to(O). If the M:arkov property is satisfied, however, then

P(X2 = X2 I Xo = Xo and Xl = Xl) = "P(X2 = X2 I Xl = Xl),

which is determined by p and q. In this case

P(Xo = Xo , Xl = Xl' a.nd X2 = X2)

= P(Xo = XO)P(XI = Xl I Xo =: XO)P(X2 = X2 I Xl == Xl)·

For exanlple,

P(Xo = 0, Xl = 1 , and X2 = 0)

= P(Xo = O)P(XI - 1 I �(o = 0)P(X2 = 0 I Xl = 1)

= 1to(O)pq.

The reader should che�ck the remaining entries in the following table, which gives the joint distribution of Xo , Xl' and X2•

Xo Xl X2 P(Xo = Xo , Xl = Xl' and X2 = x2)

0 0 0 1to(O)(1 - p)2

0 0 1 1to(O)(1 - p)p 0 1 0 1to(O)pq 0 1 1 1to(O)p(1 - q) 1 0 0 ( 1 -- 1to(O))q(1 - p) 1 0 1 (1 -- 1to(O))qp 1 1 0 ( 1 -- 1to(O))(1 - q)q 1 1 1 ( 1 -- 1to(O))(1 - q)2

1.2. Transition function and initial distribution

1 .2 . Transition fu nction and in it ia l d istri bution

5

Let XII' n > 0, be a Markov chain having state space f/. (The restriction to two states is now dropped.) The function P(x, y), x E f/ and y E f/, defined by

(6) P(X, y) = ,P(XI = y I Xo = x), x, Y E f/,

is called the transition function of the chain. It is such that

(7) P(X, y) > 0, x, Y E f/,

and

(8) L P(x, y) = 1 , x E: 1/. y

Since the Markov chain has stationary probabilities, we see that

(9) P(XII+l = y I XII = x) = P(x, y), n > 1.

It now follows from the Markov property that

( 10) P(XII+l = y I Xo = xo, · · ·, XII-l = XII-I' XII = X) = P(x, y).

In other words, if the Markov chain is in state X at time n, then no matter how it got to x, it has probability P(x, y) of being in state y at th(� next step. For this reason the numbers P(x, y) are called the one-step transition probabilities of the Markov chain.

The function 1to(x), X E 1/, defined by

(11) 1to(X) = P(Xo = x),

is called the initial distribution of the chain. It is such that

( 12) 1to(X) > 0, X E f/,

and

(13) L 1to(x) = 1. x

The joint distribution of Xo, . . . , XII can easily be expressed in terms of the transition function and the initial distribution. For example,

P(Xo = Xo, Xl = Xl) = P(Xo = XO)P(XI = Xl I Xo = xo)

= 1to(xo)P(xo, Xl)· Also,

P(Xo = Xo, Xl = Xl ' X2 = X2)

= P(Xo = Xo, Xl = Xl)P(X2 = X2 I Xo = Xo, Xl := Xl)

= 1to(Xo)P(xo, Xl)P(X2 = X2 I Xo = Xo, Xl = Xl).

6 Markov (�hains

Since Xm n > 0, satisfies the Markov property and has stationary transition probabilities, we see that

P(X2 = X2 I Xo = Xo, Xl = Xl) = P(X2 = X2 I Xl = Xl) ,

Thus

= P(XI = X2 I Xo = Xl)

= P(Xh X2)·

P(Xo = Xo, Xl =: Xl ' X2 = X2) = 1to(.Xo)P(xo, XI)P(X1, x2)· By induction it is easily seen that

(14) P(Xo = Xo, . . . , Xn = xn) = 1tO(XO)P�(Xo, Xl) · · · P(Xn-l, Xn).

It is usually more convenient, however, to reverse the order of our de:finitions. We say that P(x, y), X E f/ and y E f/, is a transition/unction if it satisfies (7) and (8), and we say that 1to(x), X E f/, is an initial distribution if it satisfies ( 12) and (13) . It can be sho'wn that given any transition function P and any initial distribution 1to, there is a probability spac�e and random variables Xm n > 0, defined on that space satisfying (14). It is not difficult to show that these random variables form a Markov chain having transition function P and initial distribution 1to.

The reader may be bothered by the possibility that some of the conditional probabilities we have discussed may not be well defined. For example, the left side of (1) is not well defined if

P(}(o = Xo, · . . , Xn = xn) = O.

This difficulty is easily r�esolved. Equations (7), (8), ( 12), and (13) defining th,e transition functions and the initial distributions are well defined, and Equation (14) describing the joint distribution of Xo, . . . , Xn is well de:fined. It is not hard to show that if (14) holds, then (1), (6), (9), and (10) hold whenever the conditional probabilities in the respective equations are w(�ll defined. The same qualification holds for other equations involving conditional probabilities that will be obtained l later.

It will soon be appare:nt that the transition function of a Markov chain \.

plays a much greater role in describing its properties than does the initial distribution. For this r�eason it is customary to study simultaneously all Markov chains having a given transition function. In fact we adhcere to th�e usual convention that by "a Markov chain having transition function P, " we really mean the family of all Markov chains having that transition function .

1.3. Exam ples

In this section we will briefly describe sev(�ral interesting examples of Markov chains . These e:xamples will be furthe:r developed in the sequel.

1.3. ExsR,ples 7

Example 1. Ra nd()m walk. Let �l' �2' • • • be independent integervalued random variables having common density f Let Xo be an integer-valued random variable that is independent of the �/s and set X"J = Xo + �l + · · · + �n . The sequence Xn, n > 0, is called a random walk. It is a Markov chain whose state spaoe is the integers and �whose transition function is given by

P(X, y) = fey - x).

To verify this, let 1to denote the distribution of Xo . Then

P(Xo = xo, ' . . , Xn = xn)

= P(J,(o = XO, �l = Xl - ��o, · . . , �n = Xn - xn·-l)

= P(){o = XO)P(�1 = Xl - xo) · ·· P(�n = xn - xn-l)

= 1to(.xo)f(xl - xo) · · ·f(xln - xn-l)

= 1tO(XO)P(Xo, Xl) · · · P(Xn-l , xn),

and thus ( 14) holds.

Suppose a "particle" l1noves along the integers according to this Markov chain. Whenever the particle is in x, regardless of how it got th(�re, it jUlnps to state y with probability fey - x) .

. As a special case, consider a simple randorn walk in which f(l) = p, f( - 1) = q, and f(O) = r, where p, q, and r are nonnegative and sum to one. The transition function is given by

p, y= x + 1 ,

P(x, y) = q , y = ;� - 1, r, y = x, 0, elsewhere.

Le:t a particle undergo such a random walk. If the particle is in state x at a given observation, then by the next observation it will have jumped to state X + 1 with probability p and to state x - I with probability q ; with probability r it will still be in state x.

IExample 2. EhrenfE�st cha i n . The followring is a simple model of the exchange of heat or of gas molecules between t,,{O isolated bodies . Suppose we: have two boxes, labeled 1 and 2, and {l balls labeled 1 , 2, . . , . , d. Initially some of these balls are in box 1 and the remainder are in box 2. An integer is selected at random from 1, 2, . . . , d, and the balliabel�ed by that integer is removed from its box and placed in the opposite box. This procedure is repeat1ed indefinitely with the selections being ind1ependent from trial to trial. Let Xn denote the nurnber of balls in box 1 after th(� nth trial. Then Xn , n > 0, is a Markov chain on f/ = {O, 1 , 2, . . . , ' d} .

8 Marko v Chains

The transition function of this Markov chain is easily computed. Suppose that there are .x balls in box 1 at timle n. Then with probability x/d the ball drawn on the (n + l)th trial will be from box 1 and will be transferred to box 2. In this case there will be x - I balls in box 1 at time n + 1 . Similarly, with probability (d - x)/d the ball drawn on the (n + l )th trial will be from box 2 and will be transferred to box 1 , resulting in x + 1 balls in box 1 at time n + 1 . Thus the transition function OIf this Markov chain is given by

x d' y = x - 1,

P(x, y) = 1 x - d' y := X + 1 ,

0, elsewhere.

Note that the Ehrenfest chain can in one transition only go from state x to x - l or x + 1 with positive probability.

A state a of a Markov chain is called an absorbing state if P(a, a) = 1 or, equivalently, if P(a, y) = 0 for y i= a. The next example uses this dennition.

Example 3. Gambler's rui n chai n . Suppose a gambler starts out with a certain initial capital in dollars and makes a series of one dollar bets against the house. Assume that he has respective probabilities p and q == 1 - p of winning and losing each bet, and that if his capital ever reaches zero, he is ruinled and his capital re1mains zero thereafter. Let Xn, n > 0, denote the gambler's capital at time n. This is a Markov chain in which 0 is an absorbing state, and for x > 1

( 1 5) ( q,

P(:x, y) = p, 0,

y = x - I, y = x + 1 , elsewhere.

Such a chain is called a gambler's ruin chain on f/ = {O, 1 , 2, . . . } . We can modify this model by supposing that if the capital of the gal1nbler increases to d dollars hIe quits playing. In this case 0 and d are both absorbing states, and ( 1 5) holds for x = 1 , . . . , d - 1 .

F or an alternative interpretation of the latter chain, we can assume that two gamblers are making a series of one dollar bets against each other and tha.t between them they have a total capital of �d dollars. Suppose the: first garnbler has probability p of winning any given bet, and the second gambler has probability q = 1 - p of winning. The two gamblers play until one

1.3. Exsm'ples 9

of them goes broke. Let Xn denote the capital of the first gambler at time n. Then Xm n > 0, is a gambler'S ruin chain on {O, 1 , . . . , d} .

Example 4. Birth and death chain. Consider a Markov chain either on f/ = {O, 1 , 2, ... } or on f/ = {O, 1 , ... , ti} such that starting from x th(� chain will be at x -- 1 , x, or x + 1 after one step. The transition function of such a chain is given by

P(x, Y) = qx, Y = x -I ,

Y = x, Y = x + 1, elsewhere,

where Px, qx, and r x are nonnegative numbers such that Px + qx + r x = 1 . The Ehrenfest chain and the two versions of the gambler'S ruin chain an� examples of birth and death chains. The phrase "birth and dleath" ste:ms from applications iln which the state of the chain is the population of S01me living system. In these applications a transition from state x to sta.te x + 1 corresponds to a "birth," while a transition from state: x to sta.te x-I corresponds to a "death."

In Chapter 3 we will study birth and death processes. These processes ar(� similar to birth and death chains, except that jumps are allowed to oc��ur at arbitrary times instead of just at integer times. In most applications, the models discussed in Chapter 3 are: more realistic than those obtainable by using birth and death chains.

IExample 5. Queui ng cha in . Consider a. service facility such as a ch��ckout counter at a supermarket. People arrive at the facility at various tinles and are eventually served. Those customlers that have arrived at the facility but have not yet been served form a waiting line or queue. �rhere ar(� a variety of models to describe such systems. We will consider here only one very simple and somewhat artificial model ; others will be discussed in Chapter 3.

]Let time be measured in convenient periods, say in minutes. Suppos�� that if there are any customers waiting for service at the beginning of any given period, exactly one customer will be served during that period, and that if the�re are no customers 'Naiting for service at the beginning of a pe�riod, none will be served during that period. Let �n denote the number of new customers arriving during the nth period. We assume that �1' �2' • • • are independent nonnegative integer-valued randorn variables having COlllmon density f

1 0 Markov Cbains

1Let Xo denote the numlber of customers pres��nt initially, and for n > 1, let Xn denote the number of customers present at the end of the nth pe�riod. If �Yn = 0, then Xn + I = �;n + I ; and if Xn > 1 , then Xn + I = Xn + en + I - 1 . It follows without diffi(�ulty from the assum.ptions on �m n > 1, that Xml n > 0, is a Markov chain whose state space� is the nonnegative integers and whose transition function P is given by

P(O, y) = f(y) and

P(x, y) = f(y - x + 1), x > 1.

IExam ple 6. Branch i ng cha in . Consider particles such as neutrons or bacteria that can gen��rate new particles of the same type. The initial set of objects is referred to as belonging to the Oth generation. Particles generated from the nth generation are said to belong to the (n + l )th generation. Let Xm n ;> 0, denote the nUInber of particles in the nth generation.

1� othing in this description requires that the various particles in a generation give rise to new particles simultaneously. Indeed at a given time, pal1icles from several generations may coexist.

j\. typical situation· is illustrated in Figure 1 : one initial particle gives rise to 1two particles. Thus Xo = 1 and Xl = 2. One of the particles in the� first generation gives rise to three particles and the other gives rise to one particle, so that X2 = 4. We see from Figure 1 that X3 = 2. Since neither of the particles in the third generation gives rise to new particles;, we conclude that X4 = ° and consequently that Xn = ° for all n > 4. In other words, the progeny of the initial particle in the zeroth generation become extinct after three generations.

Figure 1

1.3. Examples 1 1

In order to model this system as a Markov chain, we suppose that each particle gives rise to � particles in the next generation, where � is a nonnegative integer-valued random variable having density f We suppose that the number of offspring of the various particles in the various generations are chosen independently according to the density f

Under these assumptions Xm n > 0, forms a Markov chain whos{� state space is the nonnegativ(� integers. State 0 is an absorbing state. JFor if there are no particles in a given generation, there will not be any particles in the next generation either. For x > 1

P(x, y) = P(�l + · · · + ��x = y),

where � 1 ' • • • ' �x are independent random variables having common density f In particular, P(l , y) = f(y), y > o.

If a particle gives rise to � = 0 particles, the interpretation is that the pa.rticle dies or disappears. Suppose a particle gives rise to � particles, which in turn give rise to other particles ; but after some number of generations, all descendants of the initial particle have died or disappeared (see Figure 1) . We describe such an event by saying that the descendants of the original particle eventually become extinct. An interesting problem involving branching chains is to compute th�e probability p of ev(�ntual extinction for a branching chain starting with a single particle or, equivalently, the probability that a branching chain starting at state 1 will eventually be absorbed at state o. Once we determine p, we can easily find the probability that in a branching chain starting with x particles the de:scendants of each of the original particles eventually become extinct. Indeed, since the particles are assumed to act independently in giving rise to new particles, the desired probability is just pX.

The branching chain was used originally to determine the probability that the male line of a given person would eventually become extinct. For t"his purpose only male children would b(� included in the various generations.

Example 7. Consider a gene composed of d subunits, wher(� d is some positive integer and each subunit is either normal or mutant in form. Consider a cell with a gene composed of m mutant subunits and d - m normal subunits. Before the cell divides into two daughter cells, the gene duplicates. The corresponding gene of one of the daughter cells is composed of d units chosen at random from the 2m mutant subunits and the 2(ld - m) normal subunits. Suppose we follow a fixed line of dc�scent from a given gene. Let Xo be the number of mutant subunits initially

12 Markov C�"ains

pre�sent, and let Xn, n > 1 , be the number pre�sent in the nth descendant gene. Then Xm n > 0, is a Markov chain on /:1' = {O, 1 , 2, . . . , d} and

States ° and d are absorbing states for this chain.

1 .�J . Computations �Nith transition functions

]�et Xm n > 0, be a M[arkov chain on f/ having transition function P. In this section we will show how various conditional probabilities can be expressed in terms of P. We will also define the n-step transition funlction of the Markov chain.

��e begin with the fonmula

( 16) P(Xn+ 1 = Xn+ 1, • • • , Xn+m = Xn+m I �ro = xo, · · · , Xn = X�I)

= P(Xm Xn+ 1) · · · P(Xn+m-l , Xn+m)·

To prove ( 1 6) we write the left side of this equation as

P(Xo = XO' • • • , Xn+m = Xn+m) P(XO = XO, ·· · , Xn = Xn)

By ( 14) this ratio equals

tro(Xo)jP(xo, Xl) · · • P(Xn+m-1l, xn+m) trO(XO)P(Xo, Xl) · · · P(Xn-1, Xn)

which reduces to the right side of ( 1 6). It is convenient to rewrite ( 1 6) as

( 17) P(Xn+1 = Yl ' · . · , .. f"n+m = Ym I Xo = Xo,· · · , Xn-1 = Xn-l ' Xn = x)

= P(x, Yl)P(Yl ' Y2) · · · P(Ym-l�' Ym)·

Let Ao, . . . , An-1 be subsets of ff. It follows from ( 1 7) and Exercise: 4(a) that

( 1 8) P(Xn+ 1 = Yl ' · . . , .. :f"n+m = Ym I Xo E Ao,· . . , Xn-1 E An-I , Xn == X)

= P(X, Yl)P(Yl ' Y2) • · · P(Ym-l" Ym)·

Let B1, • • • , Bm be subsets of ff. It follows from ( 1 8) and Exercise 4(b) that

1.4. Computations with transiti'on functions 13

'The m-step transition .function pm(x, y), which gives the probability of going from x to y in m steps, is defined by

(20) pm(x, y) = � . . . � P(x, YI)P(YI ' Y2) · · · Y1 Ym-1

for m > 2, by pl (X, y) := P(x, y), and by

pO(x, y) = {I, x = y, 0, elseVl{here.

W1e see by setting BI = . . . = Bm-l = f/ and Bm = {y} in ( 19) that

(21) P(Xn+m = y I x"o E Ao, ... , Xn-l E An-I , Xn = X) = pm(X, y). In particular, by setting Ao = · · · = An-l = f/, we see that

(22)

It also follows from (21) that

(23) P(Xn+m == y I Xo = X, Xn = z) = pm(z, y). Since (see Exercise 4(c))

l,n+m(x, y) = P(Xn+m = Y I Xo = x)

= � P(Xn = z I Xo = x)P(Xn+m = Y I Xo = x, Xn == z) z

= � pn(x, z)P(Xn+m = Y I Xo =: x, Xn = z), z

we: conclude from (23) that

(24) pn+1IrI(x, y) = � pn(x, z)pm(z, y).

z

For Markov chains having a finite number of states, (24) allows us to think of pn as the nth power of the matrix P:, an idea we will pursue in Section 1 .4.2.

:Let 1to be an initial distribution for the Markov chain. Since

P(Xn = y) == � P(Xo = x, Xn = y) x

== � P(Xo = x)P(Xn := Y I Xo = x), x

we: see that

(25) P()(n = y) = � 1to(x)pn(x, y). x

This formula all.ows us to compute the distribution of Xn in terms of the initial distribution 1to and the n-step transition function pn.

14 Marko v Cbains

JFor an alternative method of computing the distribution of Xn, observe that

P(Xn+1 = y) == � P(Xn = X, Xn+l = y) x

== � P(Xn = X)P(Xn+ I = Y I Xn = x), x

so that

(26) P(Xn+l = y) = � P(Xn = X)P(X, y). x

If Vie know the distribution of Xo, we can use (26) to find the distribution of Xl. Then, knowing the distribution of Xl' we can use (26) to find the distribution of X2• Similarly, we can find the distribution of Xn by applying (26) n times.

'We will use the notation Px( ) to denot�e probabilities of various ev��nts defined in terms of a Markov chain starting at x. Thus

Px(XI ¥= a, X2 ¥= a, X3 = a)

denotes the probability that a Markov chain starting at x is in a state a at time 3 but not at time 1 or at time 2. In ternlS of this notation, ( 19) can be rewritten as

(27) P(Xn+1 E BI,· .. , )(n+m E Bm I Xo E Ao,···, Xn-l E An-I' Xn = X) = Px(XI E B1, ••• , Xm E Bm).

1 .4�.1 . H itti ng ti mes.,

TA of A is defined by Let A be a subs�et of Y. The hitting time

1.:' = min (n > 0 : Xn E: A)

if )rn E A for some n > 0, and by TA = 00 if Xn ¢ A for all n > O. In other words, TA is the first positive time the Nlarkov chain is in (hits) A. Hitting times play an important role in the th��ory of Markov chains.. In this book we will be interested mainly in hitting times of sets consisting of a single point. We denote the hitting time of a point a E Y by 7;, rather than by the more cumbersome notation 1{a} .

.i�n important equation involving hitting times is given by n�

(28) pn(x, y) = �: Px(Ty = m)pn-m(y�, y), n > 1. m==l

In order to verify (28) we note that the levents {Ty = m, Xn =: y} , 1 �; m < n, are disjoint and that

n {Xn = y} = U {Ty = m, �J(n = y}.

_=1

1.4. Computations with transition functions 15

We have in effect decomposed the event {Xn = y} according to the hitting tirne of y. We see from this decomposition that

n = L P x(Ty = m, Xn = y)

m= l n

= L Px(Ty = m)P(Xn = Y I Xo = x, Ty = m) m= l

n

= L Px(Ty = m)P(Xn = Y I Xo = X, Xl #= y, . . . , m= l

n - L Px(Ty = m)pn-m( y, y),

m= l

and hence that (28) holds.

Example 8. Show that if a is an absorbing state, then pn(x, a) =

Px(Ta < n), n > 1 .

If a is an absorbing state, then pn-m(a, a) = 1 for 1 < m < n, and hence (28) implies that

n

pn(x, a) = L Px(Ta = m)pn-m(a, a) m= l n

= L Px(Ta = m) = Px(Ta < n). m= l

�Dbserve that

and that

Px(Ty = 2) = L JPx(Xl = Z, X2 = y) = L P(x, z )P(z, y). z*y z*y

For higher values of n the probabilities Px(Ty = n) can be found by using the formula

(29) Px(Ty = n + 1) = L P(x, z )Pz(Ty = n), n > 1. z*y

This formula is a consequence of (27), but it should also be directly obvious. For in order to go from x to y for the first time at time n + 1 , it is necessary to go to some state z :F y at the first step and then go from z to y for the first time at the end of n additional steps.

16

1 .• �.2. Transition mcitrix. Suppose now that the state space f/ is finite, say f/ = {O, 1 , . . . , d} . In this case we can think of P as the transition matrix having d + 1 rows and columns given by

0 d � [P(�' 0) P(O, d)

d P·(d, 0) P(d, d)

For example, the transition matrix of the gambler's rUIn chain on {O" 1 , 2, 3} is

0 1 2 3

0 1 0 0 0 1 q 0 p 0 2 0 q 0 P 3 0 0 0 1

Similarly, we can regard pn as an n-step transition matrix. Formula (24) with m = n = 1 becom��s

p2(X, Y) = L P(x, z)P(z, y). z

Re:calling the definition of ordinary matrix multiplication, we observe that the two-step transition matrix p2 is the product of the matrix P' with itsc�lf. More generally, by setting m = 1 in (24) we see that

(30) pn+ l(X, y) = L pn(x, z )l)(z, y). z

It follows from (30) by induction that the n-step transition matrix pn is th(;� nth power of P.

AA.n initial distribution no can be thought of as a (d + I)-dimensional ro�w vector

If 'we let 'ltn denote the «(1 + I)-dimensional row vector

'ltn = (P(Xn = 0), . . . , P(X� = d)),

the�n (25) and (26) can b�� written respectively as

and

"fhe two-state Markov chain discussed in St�ction 1 . 1 is one of thle few examples where pn can be found very easily.

1.5. Tran��ient and recurrent states 1 7

Example 9. Consider the two-state Markov chain having one-step transition matrix [1 - p p ] p=

q 1- q '

where p + q > O. Find pn.

In order to find pn(o, 0) = p O(Xn = 0), w'e set no(O) = 1 in (3) and obtain

pn(o, 0) == q + (1 _ P _. q)n P

p + q p+q

In order to find pn(o, 1) = Po(Xn = 1), we set no(l) = 0 in (4) and obtain

pn(o, 1) == P - (1 _ p _. q)n P

p+q p + q

Si1nilarly, we conclude that

and

It follows that

p(n1 , 0) == q

- (1 _ p _ q)n q

p + q p+q

pn(1 , 1) == P + (1 _ p _ q)n q

p+q p+q

pn = 1 [q p] +

(1 - p - q)n [ p - p] .

p + q q P p + q -q q

1 . �5. Transient and roecurrent states

Let Xm n > 0, be a 1vJlarkov chain having state space g and transition function P. Set

Pxy = P x(Ty < ex)).

Then Pxy denotes the probability that a Markov chain starting at x will be in state y at some positive time. In particular, Pyy denotes the probability that a Markov chain starting at y will ever return to y. A state y is call1ed recurrent if Pyy = 1 and transIent if Pyy <: 1. If y is a recurrent state, a ]\1arkov chain starting at y returns to y with probability one. If y is a transient state, a Markov chain starting at y has positive probability 1 -PYl' of never returning to y. If y is an absorbing state, then Py(Ty = 1) =

18 Markov Chains

P(y, y) = 1 and hence Pyy = 1 ; thus an absorbing state is necessarily recurrent.

Let l y(z), z E fI', denote the indicator function of the set {y} defined by

ly(z) = {�: Z = y, z ¥= y.

Le�t N(y) denote the number of times n > 1 that the chain is in state y. Since 1 y(Xn) = 1 if the chain is in state y at time n and 1 y(Xn) = 0 otherwise, we see that

00 (3 1) N(y) = 1: l y(Xn)· n = l

The event {N(y) > I} is the same as the event {Ty < oo}. Thus

Px(N(y) > 1) = Px(Ty < (0) = Pxy.

L��t m and n be positive integers . By (27), the: probability that a Markov chain starting at x first visits y at time m and next visits y n units of time later is Px(Ty = m)Py(Ty = n). Thus

00 00 Px(N( y) > 2) = 1: 1: Px(Ty = m)Py(Ty = n)

m= 1 n= 1

= PxyPyy·

Si:milarly we conclude that

(32) Px(N(y) > m) = PXyp;y- l ,

Since

m > 1 .

Px(N(y) = m) = Px(N(y) > m) - Px(N(y) > m + 1),

it follows from (32) that

(33) m > 1 .

Also

so that

(34) Px(N(y) = 0) = 1 - Pxy.

These formulas are intuitively obvious . To sc�e why (33) should be true, for example, observe that a chain starting at x visits state y exactly m times if and only if it visits y for a first time, returns to y m - 1 additional times, and then never again returns to y.

1.5. Trans�ient and recurrent states 1 9

'We use the notation l�x( ) to denote expectations of random variables dejfined in terms of a Ma.rkov chain starting at x. For example,

(35)

It follows from (3 1) and (35) that

00 = � Ex(ly(��n)) n = l

00 = � pn(x, y). n= 1

Set 00

G(x, y) = Ex(N( y)) = � .pn(X, y) . n = l

Then G(x, y) denotes the expected number of visits to y for a Markov chain starting at x.

and

(36)

Theorem 1 (i) l,et y be a transient state. Then

G(x, y) = Pxy , 1 - Pyy X E [/',

which is finite for all .x E [/'.

(ii) Let y be a recurrent state. Then JPy(N(y) = (0) = 1 a.nd G(y, y) = 00 . Also

(37) X E [/'.

·If Pxy = 0, then G(x, y) = 0, while if Pxy > 0, then G(x, y) = 00 .

�rhis theorem describes the fundamental difference between a transient state and a recurrent state. If y is a transient state, then no matter vvhere the: Markov chain starts, it makes only a finit�� number of visits to y and the: expected number of visits to y is finite. Suppose instead that y is a recurrent state. Then if the Markov chain starts at y, it returns to y infinitely often. If the chain starts at some other state x, it may bc� impossible for it to .ever hit y. If it is possible, however, and the chain does visit y at least once, then it does so infinitely often.

20 Markov Chains

Proof. Let y be a transient state. Since 0 < Pyy < 1 , it follows from

(32) that

Px(N(y) = (0) = lim Px(N(y) > m) = lim PXyp�y- 1 = o. m-+ oo m-+ oo

By (33)

G(x, y) = Ex(N(y» 00

= � mPx(N(y) = m) m = l

00

= � mpXyp�y- 1(1 - Pyy). m = l

Substituting t = Pyy in the power series

we conclude that

� 4IJI- 1 _ 1 � m� - ,

m = l (1 - t)2

G(x, y) =

Pxy < 00 .

1 - Pyy

This completes the proof of (i).

Now let y be recurrent. Then Pyy = 1 and it follows from (32) that

Px(N(y) = (0) = lim Px(N( y) > m) m-+ oo

= lim Pxy = Pxy· m-+ oo

In particular, Py(N(y) = (0) = 1 . If a nonnegative random variable has

positive probability of being infinite, its expectation is infinite. Thus

G(y, y) = Ey(N(y» = 00 .

If Pxy = 0, then Px(Ty = m) = ° for all finite positive integers m, so (28) implies that P"(x, y) = 0, n > 1 ; thus G(x, y) = ° in this case. If

Pxy > 0, then Px(N(y) = (0) = Pxy > ° and hence

G(x, y) = Ex(N(y») = 00 .

This completes the proof of Theorem 1 .

Let y be a transient state. Since

00

� P"(x, y) = G(x, y) < 00 , ,, = 1

we see that

X E f/,

(38) lim P"(x, y) = 0, X E f/. ,,-+ 00

I

1.6. Decomposition of the state space 2 1

.4� Markov chain is caned a transient chain if all of its states are transient and a recurrent chain if all of its states are recurrent. It is easy to see that a Markov chain having a finite state space must have at least one recurrent state and hence cannot possibly be a transient chain. For if f/ is finite and all states are transient, then by (38)

o = 1: lim pn(x, y) y E 9' n-' oo

= lim 1: pn(x, y) n-'oo y E 9'

= lim Px(Xn E f/) n-'oo

= lim 1 = 1 , n-'oo

which is a contradiction.

1 .6 . Decomposition of the state space

Let x and Y be two not necessarily distinct states. We say that x leads to Y if Pxy > O. It is left as an exercise for the rea.der to show that x lea.ds to Y if and only if pn(x, y) :> 0 for some positive integer n. It is also lc�ft to the reader to show that if x leads to y and y leads to z, then x leads to z.

Theorem 2 Let x be a recurrent state and suppose that x leads to y. Then y is recurrent and Pxy = Pyx = 1 .

Proof We assume that y '# x, for otherwise there is nothing to prove. Since

Px(Ty < (0) = Pxy > 0,

we see that Px(Ty = n) > 0 for some positivc� integer n. Let no b�e the least such positive integer, i .e . , set

(39) no = min (n > 1 : Px(Ty = n) > 0).

It follows easily from (39) and (28) that pno(x, y) > 0 and

(40) 1 < m < no .

Since pno(x, y) > 0, we can find states Yl ' . . . , Yno - l such that

Px(X1 = Yl ' . . . , Xno - 1 = Yno - l ' Xno = y) = P(x, Yl) · · · P(Yno - l , y) > O.

NOIne of the states Yl ' . . . , Yno - l equals x or Y ; for if one of them did (�qual x or y, it would b� possible to go from x to Y with positive probability in fewer than no steps, in contradiction to (40).

22 Marko v Chains

'We will now show that Pyx = 1 . Suppose on the contrary that Pyx < 1 . Then a Markov chain starting at y has positive probability 1 - Pyx of never hitting x. More to the point, a Markov chain starting at x has the positive probability

p(x�, Y1) · · · P(Yno - 1 , y)( 1 - Pyx)

of visiting the states Y1 ' . . . , Yno - 1 ' Y successiv�ely in the first no times and never returning to x after time no. But if this happens, the Markov chain never returns to x at any time n > 1 , so we have contradicted the assumption that x is a r(;�current state.

Since Pyx = 1 , there is a positive integer n1 such that pnl(y, x) > 0. Now

Hence

pnl + n+ no(y, y) = Py(Xn1 + n + no = y)

> Py(Xn1 = X, Xn1 +n == X, Xn1 + n +no = y) = pnl( y, x)pn(x, x)PnO(x, y).

00 G(y, y) > L pn

( y, y) n =nl + 1 +no

00 = L pnl +n + no( y, y)

n = 1 00

> pnl ( y, X)PnO(X, y) L pn(X, x) n = 1.

= pnl( y, X)PnO(X, y)G(X, x) = + 00,

from which it follows that y i s also a recurrent state. Since y is recurrent and y leads to x, we see from the part of the

theorem that has already been verified that Px:-r = 1 . This completes the pro� I

l\. nonempty set C of states is said to be closed if no state inside of C leads to any state outsid(;� of C, i .e . , if

(41) Pxy = 0, X E C and y ¢ C.

Equivalently (see Exercis,e 1 6), C is closed if and only if

(42) pn(x, y) == 0, X E C, y ¢ C, and n > 1 . Actually, even from the 'Neaker condition

(43) P(x, y) = 0, X E C and y ¢ C,

we can prove that C is closed. For if (43) holds, then for x E C and y ¢ C

p2(X, Y) = L P(x, z)P(z, y) z e f/

= L P(x, z)P(z, y) = 0, z e C

1. 6. Decomposition of the state space 23

and (42) follows by induction. If C is closed, then a Markov chain starting in C will, with probability one, stay in C for all time. If a is an absorbing state, then {a} is closed.

l\ closed set C is called irreducible if x leads to y for all choices of x

and y in C. It follows from Theorem 2 that if C is an irreducible closed set, then either every state in C is recurrent or every state in C is transient. The next result is an immediate consequence of Theorems 1 and 2.

Corollary 1 Let C be an irreducible closed set of recurrent states. Then Pxy = 1 , Px(N(y) = (0) = 1 , and G(x, y) = 00 for all choices of x and y in C.

An irreducible Markov chain is a chain whose state space is irreducible, that is, a chain in which every state leads back to itself and also to every other state. Such a Markov chain is necessarily either a transient chain or a recurrent chain. Corollary 1 implies, in particular, that an irreducible recurrent Markov chain visits every state infinitely often with probability on�e.

We saw in Section 1 . 5 that if g is finite, it contains at least one recurrent state. The same argument shows that any finite closed set of states contains at least one recurrent state. N ow let C be a finite irreducible closed set. We have seen that either every state in C is transient or every state in C is recurrent, and that C has at least one recurrent state. It follows that every state in C is recurrent. We summarize this result :

Theorem 3 Let C be a finite irreducible closed set of states. Then every state in C is recurrent.

(:onsider a Markov chain having a finite nu]mber of states. Theorem 3 implies that if the chain is irreducible it must be recurrent. If the chain is not irreducible, we can use Theorems 2 and 3 to determine which states are recurrent and which are transient.

IExample 1 0. Consider a Markov chain having the transition matrix

0 1 2 3 4 5 0 1 0 0 0 0 0 1 1. 1 1. 0 0 0 4 "2 4 2: 0 1 2 1 0 1 5 5 5 5 3 0 0 0 1 1 1

"6 "3 "2

4 0 0 0 1 0 1 "2 "2

5 0 0 0 1. 0 .3. 4 4

Determine which states are recurrent and which states are transient.

24 Markov Cbains

.As a first step in studying this Markov chain, we determine by inspe:ction which states lead to which other states. This can be indicated in lIlatrix form as

0 1 2 3 4 5

0 + 0 0 0 0 0 1 + + + + + + 2 +. + + + + + 3 0 0 0 + + + 4 0 0 0 + + + 5 0 0 0 + + +

The x, y element of this matrix is + or 0 according as Pxy is positive or zero, i .e . , according as x does or does not lead to y. Of course, if P(x, y) > 0, then Pxy > O. The converse is c��rtainly not true in general . FOIr example, P(2, 0) = 0 ; but

P2(2, 0) = P(2, l )P(l , 0) = t · ! = lo > 0,

so that P20 > O. State 0 is an absorbing state, and hence also a recurrent state. Wr e see

cle:arly from the matrix of + 's and O's that {3, 4, 5} is an irreducible closed set. Theorem 3 now implies that 3, 4, and 5 are recurrent states. States 1 and 2 both lead to 0, but neither can be reached from o. We see from Theorem 2 that 1 and 2 must both be transient states. In sumlnary, states 1 and 2 are transie:nt, and states 0, 3, 4, and 5 are recurrent.

:Let f/ Tdenote the collection of transient states in f/, and let f/ R dlenote th(� collection of recurrent states in f/. In Example 10, f/ T = { I , 2} and f/R = {O, 3 , 4, 5} . The set f/ R can be deco]nposed into the disjoint irreducible closed sets Cl = {OJ and C2 = {3, 4, 5} . The next the:orem shows that such a decomposition is always possible whenever !;'fJ R is nonempty.

TheoreM 4. Suppose that the set f/ R of recurrent states is nonempty. Then f/ R is the union of a finite or countably infinite number of disjoint irreducible closed sets C\ , C2, • • • •

. Proof Choose x E 9') R and let C be the set of all states y in f/ R such that x leads to y. Since ,x is recurrent, Pxx = 1 and hence x E C. Wle will now verify that C is an irreducible closed set. Suppose that y is in (7 and y leads to z. Since y is recurrent, it follows from Theorem 2 that z is recurrent. Since x leads to y and y leads to z, we conclude that x leads to z. Thus z is in C. Tbis shows that C is closed. Suppose that y and z

ar(� both in C. Since x is recurrent and x leads to y, it follows from

1.6. Deco'mposition of the stat�' space 25

Theorem 2 that y leads to x. Since y leads to x and x leads to z, we conclude that y leads to z. This shows that (; is irreducible.

To complete the proof of the theorem, we need only show that if Ie and D are two irreducible closed subsets of f/ R" they are either disjoint or id�entical . Suppose they are not disjoint and let x be in both C and D. Choose y in C. Now .x leads to y, since x is in C and C is irreducible. Since D is closed, x is in D, and x leads to y, we conclude that y is in D. Thus every state in C is also in D. Similarly every state in D is also in C, so that C and D are identical. I

We can use our decornposition of the state space of a Markov chain to understand the behavior of such a system. If the Markov chain starts out in one of the irreducibl�e closed sets Ci of recurrent states, it stays in Cj forever and, with probability one, visits every state in Cj infinitely often. If the Markov chain starts out in the set of transient states 9' T, it either stays in fl'T forever or, at some time, enters one of the sets Cj and. stays there from that time on, again visiting every state in that Ci infinitely iQften .

1.16.1 Absorption p."obabi l ities. Let C be one of the irreducible closed sets of recurrent states, and let Pc(x) == P x{Tc < 00) be the probability that a Markov chain starting at x eventually hits C. Sin(�e the chain remains permanently in C once it hits that set, we call Pc(;() the probability that a chain starting at x is absorbed by the set C. Clearly Pc(x) = 1 , X E C, and pc{x) = 0 if x is a recurrent state not in C. It is not so clear how to compute Pc(x) for x E; fl'T' the set of transient states.

If there are only a finit,e number of transient states, and in particular if f/ itself is finite, it is always possible to compute Pc(x), x E fI' T' by solving a system of linear equations in which there are as many equations as unknowns, i .e. , members of f/ T. To understand why this is the case, observe that if x E f/ T, a chain starting at x can enter C only by entering C at time 1 or by being in f/ T at time 1 and entering C at some future� time. The former event has probability Ly e c P(x, y) and the latter event has probability Ly eVT P(x, y)pc(y). Thus

(44) Pc{x) = � PI{X, y) + � P(x, Y)Pc( y), y e C y e V T

Equation (44) holds whether f/ T is finite or infinite, but it is far frolIi clear how to solve (44) for the unknowns Pc(x), x E �f7 T, when fI' T is infinite. An additional difficulty is that if f/ T is infinite, then (44) need not have a unique solution. Fortunately this difficulty does not arise if f/ T is finite.

26 Marko v C�hain$

Theorem 5 Suppose the set f/ T of transient states is finite and let C be an irreducible closed set of recurrent states. Then the system of equations

(45) f(x) = L p·(x, y) + L P(x, y)f(y), y E C y E f/T

has the unique solution

(46) ./(x) = PC(x) ,

Proof If (45) holds, then

fey) = L l»(y, z) + L P(y, z)f(z), Z E C Z E f/T

Substituting this into (45) we find that

f(x) = L P(x, y) + L L �P(x, y)P(y, z) y E e y E f/T Z E C

+ L L P(x, y)P(y" z)f(z) . )I' E f/T Z E f/T

The sum of the first t"fO terms is just Px(Tc < 2), and the third term reduces to LZ Ef/T p2(X, z)f(z), which is the same as LY Ef/T p

2(X, y)f(y). Thus

f(x) = jPx(Tc < 2) + L P2(X, y)f(y). y E f/T

By repeating this argum(�nt indefinitely or by using induction, we con.clude that for all positive integers n

(47) f(x) = Px(Tc < n) + L pn(x, y)f( y), y E f/T

Since each y E f/ T is transient, it follows fronl (38) that

(48) lim pn(x�� y) = 0, X E f/ and y E f/ T. n-+ oo

According to the assumptions of the theorem, tl'T is a finite set. It therefore folllows from (48) that the sum in (47) approaches zero as n � 00. C:onsequently for x E f/ T

f(x) = lim .Px(Tc < n) = Px(Tc < (0) = Pc(x) , n-+ oo

as desired. I

Exam ple 1 1 . Consider the Markov chain discussed in Example 10. Find

Pt O = p{o}( I) and

From (44) and the transition matrix in Exann.ple 10, we see that PI O and P2 10 are determined by the equations

PIO = t + !PIO + !P20

1.6. Decolnposition of the state space

and

Solving these equations 'Ne find that PI 0 = t and P20 = t.

27

IJy similar methods we conclude that p{3 ,4 ,s } (I) = t and P{3 ,4 , S}(2) = �. Alternatively, we can obtain these probabilities by subtracting p{o}(I) and p{0}(2) from 1 , sinc�e if there are only a finite number of transient states,

(49)

To verify (49) we note that for x E f/ T

� Pclx) = � Px(Tci < (0) = plx(Tf/R < (0). i i

Since there are only a finite number of transient states and each transient state is visited only finitely many times, the probability PX(Tf/R < (0) that a necurrent state will eventually be hit is 1 , so (49) holds.

()nce a Markov chain starting at a transient state x enters an irreducible closed set C of recurrent states, it visits every state in C. Thus

(50) Pxy = Pc(x) , X E f/ T and Y E C.

It follows from (50) that in our previous example

P1 3 = Pl4 = PI S = p{3 ,4 ,s } ( I) = t and

P23 = P24 = P2S = P{3 ,4 , S}(2) = �.

1 .6; .2 . Marti ngales. Consider a Markov chain having state space {O, . . . , d} and transition function P such that

d (51) � yP(x, y) = x, x = 01, • • • , d.

y=O Now

E[.Xn+ 1 I Xo = Xo, · · · , Xn- l = Xn- l ' Xn = x] d

= � yP[Xn+ 1 = Y I Xo = xo, · · · , Xn- l = xn- l , Xn == x] y=O

d = � yP(x, y)

y=O

by the Markov property. We conclude from (5 1) that

(52) E[Xn+ 1 I Xo = ;(0' · · . , Xn - l = xn - l , Xn = x] = x,

i .e . , that the exp�cted value of Xn + I given the past and present valuc�s of Xo, . . . , Xn equals the present value of Xn. A sequence of random variables

28 Marko v Chains

having this property is called a martingale. Martingales, which need not be M'arkov chains, play a very important role in modern probability theory. They arose first in connection with gambling. If Xn denotes the capital of a gambler after time n and if all bets are "fair," that is, if they result in zero expected gain to the gambler, then Xm n > 0, forms a martingale. Gamblers were naturally interested in finding some betting strategy�� such as increasing their bets until they win, that would give them a net expected gain after making a series of fair bets. That this has been shown to be mathematically impossible does not seem to have deterred them from their quest.

It follows from (5 1) that

d l: yP(O, y) = 0, y= o

and hence that P(O, 1 ) = · · · = P(O, d) = 0. Thus ° is necessarily an absorbing state. It follo'ws similarly that d is an absorbing state. Consider now a Markov chain satisfying (5 1) and having no absorbing states other than ° and d. It is left as an exercise for the reader to show that under these conditions the states 1 , . . . , d - 1 each lead to state 0, and hence each is a transient state. If the Markov chain starts at x, it will eventually enter one of the two absorbing states ° and d and remain there permanently.

It follows from Example 8 that

d Ex(Xn) = l: yPx(Xn = y)

)1= 0 d

= l: yP"(x, y) y= o d- 1

:= l: yP"(x, y) + dJ'"(x, d) y= 1 d- 1

= l: ypn(x, y) + dPx(Td < n). y= 1

Since states 1 , 2, . . . , d - 1 are transient, \ve see that pn(x, y) --+. ° as n .-+ 00 for y = 1, 2, . . ,. , d - 1 . Consequen1tly,

lim Ex(Xn) = dPx(Td < (0) = dPxd. n-' oo

On the other hand, it follows from (5 1 ) (see Exercise 13(a)) that EXn =

E)'(n- 1 = · · · = EXo and hence that Ex(Xn) = x. Thus

n-'oo

1. 7. Birth and death chains

By equating the two values of this limit, we conclude that

(53) X Pxd = d ' x = 0, . . . , d.

Since PxO + Pxd = 1 , it follows from (53) that

x PxO = 1 - d' x = 0, . . . , d.

29

Of course, once (53) is conjectured, it is easily proved directly from Theorem 5. We need only verify that for x =: 1 , . . . , d - 1 ,

(54) d- 1

� = P(x, d) + 1: � P(x, y) . d y= 1 d

Clearly (54) follows froIn (5 1). The genetics chain introduced in Example� 7 satisfies (5 1) as does a

gambler's ruin chain on {O, 1 , . . . , d} having transition matrix of the form

1 0 0

t o t t o t

o t o t

o 1

Suppose two gamblers make a series of one dollar bets until one of them goes broke, and suppose that each gambler has probability t of winning any given bet. If the first gambler has an initial capital of x dollars and thle second gambler has an initial capital of d -- x dollars, then the s(�cond gambler has probability Pxd = x/d of going broke and the first galmbler has probability 1 - (x/d) of going broke.

1 .7. B i rth and death cha i ns

For an irreducible Markov chain either every state is recurrent or every state is transient, so that an irreducible Markov chain is either a recurrent chain or a transient chain. An irreducible Markov chain having only finitely many states is necessarily recurrent . It is generally difficult to de,cide whether an irreducible chain having infinitely many sta1tes is recurrent or transient. We are able to do so, however, for the birth and death chain.

30

Consider a birth and death chain on the nonnegative integers or on thc� finite set {O, . . . , d} . In the former case we: set d = 00 . The transition function is of the form

y = �� - 1 , y = ��,

y = x + 1 ,

where Px + qx + 'x = 1 for x E !/, qo = 0, and Pd = 0 if d < 00 . We assume additionally that Px and qx are positivl� for 0 < x < d.

For a and b in !/ such that a < b, set

a �< x < b,

and set u(a) = 1 and u(b) = O. If the birth and death chain starts at y, thc�n in one step it goes to y - 1 , y, or y + 1 '�ith respective probabilities qy, r y' or py. It follows that

(55) u(y) = qyu(y - 1) + ryu(y) + pyu(y -1- 1 ),

Since "y = 1 - Py - qy�, we can rewrite (55) as

(56) u( y + 1) - u( y) = qy (u( y) - u( y - 1)), Py

Set Yo = 1 and

a < y < b.

a < y < b.

(57) ,\)1 = q 1 • • • qy I Y , o <::: y < d.

PI · · • Py

From (56) we see that

u( y + 1) - u( y) == � (u( y) - u( y -- 1)), 1'y- l

from which it follows that

a < y < b,

u( y + 1) - u( y) = Ya + 1 • • • � (u(a + 1) - u(a))

Consequently,

Ya Yy- 1

= Yy (u(a + 1) - u(a)). Ya

(58) u( y) - u( y + 1) = Yy (u(a) - u(a + 1)), Ya

a < y < b.

Su:mming (58) on y = Q, • • • , b - 1 and recalling that u(a) = 1 and u(b) = 0, we conclude that

u(a) - u(a + 1) _

II - -';=-1 - . Ya Ly=a Yy

1. 7. Birth and death chains

Thus (58) becomes

u(y) - u( y + 1) = b�� , Ly=a '}'y

31

a < y < b.

Su:mming this equation on y = x, . . . , b - 1 and again using the formula u(b) = 0, we obtain

",b- l u(x) = �y=x '}'y

",b- l ' �y=a Yy

a < x < b.

It now follows from the definition of u(x) that

(59) ",b- l

P (T < T. ) = �y=x '}'y x a b ",b - l ' �y=a Yy

a < x < b.

By subtracting both sides of (59) from 1 , we sc�e that

(60) ",x- l

p (T. < T ) = �y=a '}'y x b a ",b- l ' �y=a Yy

a < x < b.

IExample 1 2. A gamLbler playing roulette Inakes a series of one dollar bets. He has respective probabilities 9/19 and 10/ 19 of winning and losing each bet. The gambler d�ecides to quit playing as soon as his net winnings rea.ch 25 dollars or his n(�t losses reach 10 dollars.

(a) Find the probability that when he quits playing he will have won 25 dollars.

(b) Find his expected loss.

�rhe problem fits into our scheme if we let �(n denote the capital of the galnbler at time n with Xo = 10. Then Xn , n > 0, forms a birth and death chain on {O, 1 , . . . , 35} ,with birth and death rates

and Px = 9/1 9,

qx = 10/1 9,

o < x <::: 35,

o < x <::: 35.

Sta.tes 0 and 35 are absorbing states. Formula (60) is applicable with a == 0, x = 10, and b = 35. We conclude that

o < y ::� 34,

and hence that

PlO(T35 < To) = L�= o (10/9)' =

( 10/9) 1 0 - 1 = .047.

I:;!o (10/9)Y (10/9)3 5 - 1

Thus the gambler .has probability .047 of winning 25 dollars. His expc�cted loss in dollars is 10 - 35(.047) , which equals $8 .36.

32

In the remainder of this section we consider a birth and death chain 011 the nonnegative int�egers which is irreducible, i.e. , such that Px > 0 for x > 0 and qx > 0 for x > 1 . We will dtetermine when such a chain is recurrent and when it is transient.

As a special case of ( :59),

(61) 1 P1(To <: Tn) = 1 - n - l '

L,= o "'Iy n > 1 .

Consider now a birth and death chain starting in state 1 . Since the: birth and death chain can move at most one step to the right at a time (considering the transition from state to state as movement along tbe real number line),

(62)

It follows from (62) that {To < Tn} , n > 1 , forms a nondecrc�asing sequence of events. Vie conclude from TIleorem 1 of Chapter 1 of Volume 11 that

(63) lim PI(To <:: Tn) = P1(To < Tn for some It > 1). n-+ oo

Equation (62) implies that T,. � n and thus �J � 00 as n � 00 ; hen,ce the event {To < T,. for som�e n > I } occurs if and only if the event {To .< oo} oc:curs. We can therefore rewrite (63) as

(64) lim P1(To < Tn) = Pl(�O < 00). n-+oo

It follows from (61 ) and (64) that

(6:5) 1 PI (To < (0) == 1 - --- .

L;� o "'I,

We are now in position to show that the birth and death. chain is recurrent if and only if

00 (6�5) � "'I, ::c 00.

, = 0

If the birth and death chain is recurrent, then PI (To < (0) = 1 and (66) follows from (65) . To obtain the converse, vve observe that P(O, y) = 0 for y > 2, and hence

(67) Po(To < OD) = P(O, O) + P(O, 1 )P1 (To < (0).

1 Paul G. Hoel, Sidney C. p()rt, and Charles J. Stone, l"troduct;OIt to ProlMbllil" Tlteor, (Boston : Houghton Mifflin Co., 1971), p. 1 3.

1.8. Branching and queuing chains

Suppose (66) holds. Then by (65)

PI(To < (0) = 1 .

From this and (67) we conclude that

Po(To < (0) = P(O, 0) + P(O, 1) = 1 .

33

Thus 0 is a recurrent state:, and since the chain is assumed to be irredu(�ible, it must be a recurrent chain.

In summary, we have shown that an irreducible birth and death chain on {O, 1 , 2, . . . } is recurrent if and only if

(68) f Ql · · ·

Q" = 00. x= I PI · · · Px

Example 1 3. Consider the birth and death chain on {O, 1 , 2, . . . } defined by

x + 2 Px =

2(x + 1) and x � 0.

Determine whether this chain is recurrent or transient.

Since qx =

x

P x + 2 ' x it follows that

Thus

q · · · (1 1 · 2 · · · x "'Ix = 1 '::l X = _____ _

PI • •

• Px 3 · 4 · · · (x + 2)

= 2

= 2( 1 _ 1 )

(x + 1)(x + 2) x + 1 x + 2 ·

� "'Ix = 2 � - -00 00 ( 1 1 )

x= 1 x= 1 X + 1 x + 2

= 2(t - t + t - t + ! - t + · · · ) = 2. . t = 1 .

We conclude that the chain is transient.

1 .lt Branchi ng and _:Jueu i ng chai ns

In this section we will describe which branching chains are certain of extinction and which are not. We will also describe which queuing chains

34 Markov C�hains

are transient and which are recurrent . The proofs of these results are somewhat complicated and will be given in the appendix to this chapter. These proofs can be skipped with no loss of continuity. It is inter1esting to note that the proofs of the results for the branching chain and the queuing chain are very similar, whereas th(� results themselves appear qui te dissimilar.

1 . 8.1 . Bra nch ing cha i n . Consider the: branching chain introduced in Example 6. The extinction probability p of the chain is the probability that the descendants of a given particle eventually become extinct. Clearly

Suppose there are x particles present initially. Since the numb(�rs of ofrspring of these parti1cles in the various g(�nerations are chosen indepe:ndently of each other, the probability Pxo that the descendants of each of the x particles eventually become extinct is just the xth power of the probability that the descendants of any one particle eventually be�come extinct. In other words,.

(69) _ x Pxo - P , x = 1 , 2, . . . .

Recall from Example: 6 that a particle gives rise to e particles in the ne:xt generation, where � is a random variable having density .Jf. If J(l ) = 1 , the branching chain is degenerat(� in that every state is an absorbing state. Thus we suppose that f(l) < 1 . Then state 0 is an absorbing state. It is left as an exercise for the reader to show that every state other than 0 is transient. From this it follows that, with probability one, the branching chain is either absorbed at 0 or approaches + (f:) . We conclude from (69) that

Px( Iim Xn = (0) = 1 - pX, x = 1 , 2, . . . . n-+ oo

Clearly it is worthwhile to determine p or at least to determine when p = 1 and when p < 1 . This can be done using arguments based upon th,e formula

(70) tI>(p) = p,

where fl> is the probability generating function off, defined by

00 fl>(t) = jr(o) + 1: f(y)t', o < t < 1 .

y= 1

1.8. Brancbing and queuing chains

To verify (70) we observe� that (see Exercise 9(b))

00 P = P1l 0 = pel , 0) + � pel , y)PyO y= 1

00 = pel , 0) + � pel , y)pY

y= 1 00

= f(O) + � f( y)pY y= 1

= fl>(p).

35

I.Jet Jl denote the expected number of offspring of any given particle. Suppose Jl < 1 . Then th,e equation fl>(t) = t has no roots in [0, 1) (under our assumption thatf(l) < 1), and hence p = 1 . Thus ultimate extinction is certain if Jl < 1 andf( l) < 1 .

Suppose instead that Jl > 1 . Then the equation fl>(t) = t has a unique root Po in [0, 1), and he:nce p equals either Po or 1 . Actually p always equals Po . ConsequentIy�, if Jl > 1 the probability 0/ ultimate extinction is less than one.

lrhe proofs of these results will be given in the appendix. The r(�sults themselves are intuitively very reasonable. If �[ < 1 , then on the av��rage each particle gives rise to fewer than one new particle, so we would expect the population to die out eventually. If Jl > 1 , then on the average each particle gives rise to more than one new parti��le. In this case we ,,'ould expect that the population has positive probability of growing rapidly, indeed geometrically fast,. as time goes on. Th�� case Jl = 1 is borderline ; but since p = 1 when Jl < 1 , it is plausible by "continuity" that p = 1 also when Jl = 1 .

E:xample 1 4. Suppose that every man in a certain society has exactly thrlee children, which independently have probability one-half of being a boy and one-half of being a girl. Suppose also that the number of rnales in the nth generation forlns a branching chain. Find the probability that the male line of a given Ilnan eventually becom1es extinct.

lrhe density / of the number of male children of a given man is the binomial density with parameters n = 3 and p = t. Thus /(0) = t, /(1) = i, /(2) = i, /(3) = t, and /(x) = 0 fQir x > 4. The mean number of male offspring is jJ, = t. Since Jl > 1 , the extinction probability p is the root of the equation

1. + �t + �t 2 + 1.t 3i = t 8 8 8 8

36

lyiing in [0, 1 ). We can rewrite this equation as

t 3 + 3t 2 - 5t + 1 = 0, or equivalently as

(t - 1)(t 2 + 4t - 1) = o.

Markov (�hains

TIns equation has three roots, namely, 1 , - �'5 - 2, and �5 - 2. Consequently, P = �5 - 2.

1.8.2. Queuing cha in . Consider the queuing chain introduc�ed in Example 5. Let e1 ' e2' . . . and It be as in that example. In this section we willI indicate when the queuing chain is recurrent and when it is transi,ent.

Let It denote the expected number of customers arriving in unit time. Suppose first that It > 1 . Since at most one person is served at a tiIne and on the average more than one new customer (!nters the queue at a time, it would appear that as ti1me goes on more and more people will be "raiting for service and that the queue length will approach infinity. This is indeed the case, so that if It > 1 the queuing chain is transient.

In discussing the case It < 1 , we will assum(� that the chain is irreducible (s��e Exercises 37 and 38 for necessary and sufficient conditions for irr,educibility and for results when the queuing chain is not irreducible). Suppose first that It < 1 . Then on the average fewer than one new custom��r will enter the queue in unit time. Since one customer is served whenever the queue is nonempty, we ,�ould expect that, regardless of the initial length of the queue, it will becom�� empty at some future� time. Tills is indeed the case and, in particular, 0 is a recurrent state. The case It = 1 is bord,erline, but again it turns out that 0 is a recurrent state. Thus if It � 1 and the queuing chain is irreduclbIe , it is recurrent.

The proof of these results will be given in the appendix.

A P P EN DIX

1 ,.9. Proof of results for the bra nch ing and queui ng chai ns

In this section we will verify the results discussed in Section 1 .8. To do so we need the following.

Theorem 6 Let be the probability generating function o.f a nonnegative integer-v'alued random variable e and set It = Ee (l1vith It = + 00 if e does not have finite expectation). If It < 1 and p(e = 1) < 1 , the equation

(7 1 ) (t) = t

has no roots in [0, 1 ). If It > 1 , then (71) has a unique root Po in [0, 1) .

1.9. Proof of results for the branching and queuing ch.ins 37

y

Graphs of (J)(t), 0 < t < 1 , in three typical cases corresponding to Jl <: 1 , Jl = 1 , and Jl > II are shown in Figure: 2. The fact that Jl is the left-hand derivative of (J)(t) at t = 1 plays a fundamental role in the proof of Theorem 6.

y

t J.L < 1

Figure 2

Proof Let / denote the density of � . Then

y

t

(J)(t) = J(O) + J(I)t + f(2)t 2 + · · · and

(J)'(t) = f(l) + 2f(2)t + 3f(3)t2 + · · . . Thus (J)(O) = f(O), (J)(I) = 1 , and

Po

lim (J)'(t) = .t(l) + 2f(2) + 3f(3) + · · · = Jl. t-+ 1

Suppose first that Jl < 1 . Then

lim fl>'(t) < 1 . t -+ 1

y = <J>(t)

Sinc:e �'(t) is nondecreasing in t, 0 < t < 1 , wle conclude that <!l'(t) < 1 for 0 < t < 1 . Suppose next that Jl = 1 and f(l) = P(� = 1) < 1 . Then fen) > 0 for some n > 2 (otherwise f(O) > 0, which implies that Jl < 1 , a contradiction) . Therefore (J)'(t) is strictly increasing in t, o < t < 1 . Since

lim (J)'(t) = 1 , t -+ 1

we again conclude that fl>'(t) < 1 for 0 < t < 1 .

Suppose now that Jl � 1 and P(� = 1 ) < 1 . We have shown that �'(t) < 1 for 0 < t < 1 . Thus

d - «(J)(t) - t) < 0, dt o < t < 1 ,

t

38 Markov C�hains

and hence tIl(t) - t is strictly decreasing on [0, 1 ] . Since tIl(l) - 1l = 0, w1e see that (t) - t > 0, ° < t < 1 , and hence that (7 1) has no roots on [0, 1) . This proves the first part of the theor(�m.

Suppose next that J1 > 1 . Then lim '(t) > 1 ,

so by the continuity of 1<1>' there is a number to such that ° < to < 1 and fl>1'(t) > 1 for to < t < 1 . It follows from the mean value theorem that

<1>(1) - <1>( to) > 1 . 1 - to

Since <1>( 1 ) = 1 , we conclude that (to) - to < 0. Now (t) -- t is continuous in t and nonnegative at t = 0, so by the intermediate value theorem it must have a zero Po on [0, to). Thus (7 1) has a root Po in [0, 1 ) . We will complet,e the proof of the theorem by showing that there is only one such root.

Suppose that ° < Po < PI < 1 , (po) = Po, and (P t) = Pl . Then the function (t) - t vanishes at Po, P I ' and 1 ; hence by Rolle's thleorem its first derivative has at least two roots in (o�� 1) . By another application of Rolle's theorem its second derivative "(t) has at least one root in (0, 1) . But if J1 > 1 , then at least one of the numbers f(2), f(3), . . . is strictly positive, and hence

fl>"(t) = 2f(2) + 3 · 2f(3)t + . · · has no roots in (0, 1) . This contradiction shows that (t) = t has a unique root in [0, 1) . I 1 .9.1 . Branch i ng cha in . Using Theorem 6 we see that the results fOir J1 < 1 follow as indicated in Section 1 .8 . 1 .

Suppose J1 > 1 . It follows from Theorem 6 that P equals Po or 1 , where Po is the unique root of the equation (t) = t in [0, 1) . We will sho'w that P always equals Po .

First we observe that since the initial particles act independently in giving rise to their offspring, the probability Py(To < n) that the descendants of each of the y > 1 particles becom�e extinct by time n is given by

P:y{To < n) = (P1(To �� n))Y . Consequently for n > 0 by Exercise 9(a)

00

P1(To < n + 1) = P(I , 0) + � P( I , Y)Py(To < n) y= l 00

= P(l , 0) + � P(l , y)(P1(To < n))Y y= l 00

= f(O) + � f( y)(Pt {To < n))Y, y= 1

1.9. Proo/� of results for the branching and queuing cbains

and hence

(72)

'We will use (72) to prove by induction that

(73) n > o. Now

jDl (To < 0) = 0 < Po,

39

n > O.

so that (73) is true for n := o. Suppose that (73) holds for a given value of n. Since fl>(t) is increasing in t, we conclude from (72) that

P1(To < n + 1) == (P1 (To < n)) < <lJ1(po) = Po,

and thus (73) holds for the next value of n. By induction (73) is true for all n > O.

JBy letting n -+ 00 in (73) we see that

P = P1(To < 00) = lim P1(To < n) < Po . n -+ oo

Since P is one of the two numbers Po or 1 , it must be the number Po.

1 .� • . 2. Queuing cha i n . We will now vc�rify the results of Section 1 .8 .2. Let 'n denote the number of customers arriving during the nth time period. Then '1 ' e2' . . . are independent randorn variables having COlIlmon density f, mean 11, and probability generating function <1>.

It follows from Exercise 9(b) and the identity P(O, z) = P(I , z) , valid for a queuing chain, that Poo = PI 0 . We "rill show that the number P == Poo = PI 0 satisfies the equation

(74) fl>(p) = p.

If 0 is a recurrent state, p = 1 and (74) follows immediately from th(� fact that <1>(1) = 1 . To verify (74) in general, we observe first that by Ex��rcise 9(b)

00 Poo = P(O, 0) + � P(O, y)pyo, y= 1

i .e. , that 00

(75) p = f(O) + � f( y)pyo · y= 1

In order to compute PyO' y = 1 , 2, . . . , we consider a queuing chain starting at the positive integer y. For n = 1 , 2, . . . , the event {TY- 1 = n} occurs if and only if

n = min (m > 0 : Y + (' 1 - 1) + . . . +. ('m - 1) = y - 1 )

= min (m > 0 : j� 1 + · · · + em = m -- 1 ),

40 Markov C�h.in.

that is, if and only if n is the smallest positive integer m such that the number of new customc�rs entering the queue: by time m is one less than th,e number served by time m. Thus Py(TY- 1 = n) is independent of y, and consequently Py,y- l = Py(Ty- 1 < ex)) is independent of �y for y = 1 , 2, . . . . Since PI0 = p, we see that

Py,y- l = Py- l ,y- 2 = . . . = PI0 = p.

Now the queuing chain can go at most one stC!P to the left at a time" so in order to go from state Y' > 0 to state 0 it must pass through all the intervening states y - 1 , . . . , 1 . By applying the! Markov property we can conclude (see Exercise 39) that

(7�6) P -- P P . . . p - Py yO -- y ,y - 1 y - 1 ,y - 2 1 0 - •

It follows from (75) and (76) that 00

P == 1(0) + � f( y)pY == Cl>(p) , y= 1

so that (74) holds. Using (74) and Theorc�m 6 it is easy to see that if It < 1 and the queuing

ch.ain is irreducible, then the chain is recurrent. For p satisfies (74) a:nd by Theorem 6 this equation has no roots in [0, 1 ) (observe that P( e 1 = 1 ) < 1 if the queuing chain is irreducible). We conclude that p = 1 . Since Poo = p, state 0 is recurrent, and thus since the chain is irreducible, all states are recurrent.

Suppose now that Jl :> 1 . Again p satisfies (74) which, by Theor'em 6, ha.s a unique root Po in [0, 1) . Thus P equals either Po or 1 . We will prove that p = Po .

To this end we first observ.e that by Exercise 9(a) 00

P1(To < n + 1) = P(1 , 0) + � }'(1, y)Py(To < n), y= 1

which can be rewritten as 00

(77) P1(To < n + 1) = 1(0) + � f( y)Py(To < n). y= 1

We claim next that

(78) y > 1 and n > o. To verify (78) observe that if a queuing chain starting at y reaches 0 in n or fewer steps, it must r�each y - 1 in Il or fe,�er steps, go from y -. 1 to Y - 2 in n or fewer steps, etc. By applying the Markov property w'e can conclude (see Exercise 39) that

(79) Py(To < n) < l�y(TY- l < n)Py_ 1 (Ty- 2 < n) · · · P1(To < n).

Exercises 41

Since

1 < z < y,

(78) is valid. It follows from (77) and (78) that

00

Pt(To < n +, 1) < f(O) + � f( y)(Pt(To < n))Y, y= t i.e . , that

(80) n > o. This in turn implies that

(8 1) n � 0,

by a proof that is almost identical to the proof that (72) implies (73) (the slight changes needed ar(� left as an exercise for the reader). Just as i.n the proof of the corresponding result for the branching chain, we see by letting n � 00 in (8 1) that P < Po and hence that p = Po .

We have shown that if Jl > 1 , then Poo = P < 1 , and hence 0 is a transient state. It follow's that if Jl > 1 and the chain is irreducible, then all states are transient. If Jl > 1 and the queuing chain is not irreducible, then case (d) of Exercise 38 holds (why?), and it is left to the reader to show that again all states are transient.

Exercises

1 Let Xm n > 0, be the two-state Markov chain. Find (a) P(Xt = 0 I Xo = 0 and X2 = 0), (b) P(Xt :1= X2).

2 Suppose we have two boxes and 2d balls, of which d are black and d are red. Initially, d of the balls are placed in box 1 , and the remainder of the balls are placed in box 2. At each trial a ball is chosen at random from each of the boxes, and the two balls are put back in the opposite boxes. Let Xo denote� the number of black balls initially in box 1 and, for n > 1 , let Xn de:note the number of black balls in box 1 after the nth trial. Find the transition function of the Markov chain Xn, n > o.

3 Let the queuing chain be modified by supposing that if there are one or more customers waiting to be served at th(� start of a period, th��re is probability p that onc� customer will be served during that period and probability 1 - p that no customers will be served during that period. Find the transition function for this modified queuing chain.

42 Markov C�hains

4, Consider a probability space (Q, d, P) and assume that the various sets mentioned belo,v are all in d. (a) Show that if Di are disjoint and P(C I D i) = p independently of i,

then P(C I U i D' i) = p. (b) Show that if Ci are disjoint, then P(U i Ci I D) = Li P(Ci I D). (c) Show that if Ei are disjoint and U i Ei = n, then

P« :: I D) = � P(Ei I D)l�(C I Ei n D). i

(d) Show that if Ci are disjoint and peA. I Ci) = PCB I Ci) for all i, then P(A I U iCi) = PCB I U iCi)'

51 Let Xm n > 0, be the two-state Markov chain. (a) Find Po{To = n). (b) Find PO{TI = n).

6; Let Xm n > 0, be the Ehrenfest chain and suppose that Xo has a binomial distribution with parameters d and 1/2, i .e. ,

P(X = x) = (� o 2d '

Find the distribution of Xl '

x = 0, . . . , d.

7' Let Xn, n > 0, be a Markov chain. Show that

P(Xo = Xo I Xl == Xl ' . . . , Xn = Xn) =: P(Xo = Xo I Xl = Xl)'

8: Let X and y be distinct states of a Markov chain having d < 00 states and suppose that X leads to y. Let no be the smallest positive integer such that pnO{x, y) ::> 0 and let Xl ' . . . , xno- l be states such tha1t

P(x, Xl)P(X1 , X2) • • • P{xno- 2' Xno -- l)P(Xno- l , y) > 0.

(a) Show that x, Xl ' . . . , xno- l ' y are distinct states. (b) Use (a) to show that no < d - 1 . (c) Conclude that P;,,(Ty < d - 1) > O.

91 Use (29) to verify thle following identities :

(a) Px{Ty < n + 1) = P(x, y) + � P{x�, z)Pz{Ty < n), z ':l=y

(b) Pxy = P(x, y) + � P{x, z)pzy• z ':l=y

1 01 Consider the Ehren�est chain with d = 3 . (a) Find Px{To = n) for X E f/ and 1 < .n < 3. (b) Find P, p2, and p3.

n :> O · .- ,

(c) Let 1to be the uniform distribution 1to = (!, !, !, i). Find 1l:1 , 1t2 ' and 1t3 '

Exercises

1 1 Consider the genetics chain from Example 7 with d = 3 . (a) Find the transition matrices P and p2 . (b) If 1to = (0, t, t, 0), find 1t1 and 1t2 · (c) Find Px(1{0 ,3} = n) , x E f/, for n = 1 and n = 2.

43

1 2 Consider the Markov chain having state space {O, 1 , 2} and transition matrix

0 1 2 0 [ 1 0 1 �l }� = 1 - p 0 . 2 0 1

(a) Find p2. (b) Show that p4 = p2 . (c) Find pn, n > 1 .

1 3 Let Xm n > 0, be a J�arkov chain whose state space f/ is a subset of {O, 1 , 2, . . . } and whose transition function P is such that

� yP(x, y) = Ax + B, y

for some constants A and B. (a) Show that EXn+ I = AEXn + B. (b) Show that if A ¥= 1 , then

X E f/,

EXn = B + An (EXo _

B ) . 1 - A 1 - A

1 4 Let Xm n > 0, be the Ehrenfest chain on {O, 1 , . . . , d} . Show that the assumption of Exercise 1 3 holds and use that exercise to conlpute Ex(Xn)·

1 5 Let y be a transient state. Use (36) to sho'N that for all x

00 00 � pn(x, y) < � pn( y, y) . 111 = 0 n = O

1 6 Show that Pxy > 0 if and only if pn(x, y) > 0 for some positive integer n.

1 7 Show that if x leads tiQ y and y leads to z, then x leads to z.

1 8 Consider a Markov chain on the nonnegative integers such that, starting from x, the chain goes to state .X + 1 with probability p, o 1 . (c) Show that the chain is recurrent.

44 Marko v .r:hains

1 !t Consider a Markov chain having stat,e space {O, 1 , . . . , 6} and transition matrix

0 1 2 3 4 5 6 10 t 0 t i- t 0 0

1 0 0 1 0 0 0 0 .2 0 0 0 1 0 0 0

3 0 1 0 0 0 0 0

4 0 0 0 0 t 0 t 5 0 0 0 0 t 1 0

6 0 0 0 0 0 '1- !-(a) Determine which states are transient and which states are recurrent. (b) Find POY' y = O�, . . . , 6.

2() Consider the Markov chain on {O, 1 , . . . , 5} having transition lmatrix

0 1 2 3 4 5

0 t t 0 0 0 0

1 t t 0 0 0 0

2 0 0 -1 0 � 0 8 3 t t 0 0 t t 4 0 0 i 0 ! 0

5 0 1 0 .1 1 A 5 5 5 5

(a) Determine which states are transient and which are recurrent. (b) Find P{O , l }(X), x = 0, . . . , 5.

211 Consider a Markov chain on {O, 1 , . . . , el} satisfying (5 1) and having no absorbing states other than 0 and d. Show that the states 1 , . . . , d - 1 each lead to 0, and hence that each is a transient state.

2�! Show that the genetics chain introduced in Example 7 satisfies Equation (5 1) .

2�1 A certain Markov chain that arises in genetics has states 0, 1 , . . . , 2d and transition function

P(x, y) = ' e:) c�r ( 1 - 2�rd-Y. Find p{O}(x), 0 < x < 2d.

2�J Consider a gambler"s ruin chain on {O, 1 , . . . , d} . Find

Px(To < Td), 0 < x < d.

2Ei A gambler playing roulette makes a series of one dollar bets. Fie has respective probabilities 9/1 9 and 10/19 of �winning and losing each bet. The gambler decides to quit playing as soon as he either is one dollar ahead or has lost his. initial capital of $1000. (a) Find the probability that when he quits playing he will have lost

$ 1000. (b) Find his expected loss.

Exercises 45

26 Consider a birth and death chain on the nonnegative integers such that px > 0 and qx > 0 for x > 1 . (a) Show that if L;= o Yy = 00, then Pxo == 1 , x > 1 . (b) Show that if L;= o Yy < 00 , then

P - L;= x 1y > 1 xO - �oo ' X - •

£';y= O Yy 27 Consider a gambler's ruin chain on {O, 1 , 2, . . . } .

(a) Show that if q > p , then Pxo = 1 , x > 1 . (b) Show that if q < p, then pxo = (q /p)X, X > 1 . Hint : Use Exercise 26.

18 Consider an irreducible birth and death chain on the nonnegative integers. Show that if px < qx for x > 1 , the chain is recurrent.

21 Consider an irreducible birth and death chain on the nonnegative integers such that

qx _ ( X ) 2 Px -

X + 1 '

(a) Show that this chain is transient.

x > 1 .

(b) Find PxO ' x > 1 . Hint : Use Exercise 26 and the formula :L;= 1 l /y2 = 1[2/6.

30 Consider the birth and death chain in Exarnple 1 3 . (a) Compute Px(Ta <::: Tb) for a < x < h. (b) Compute PxO ' x > O.

31 Consider a branching chain such that f(l) < 1 . Show that every state other than 0 is transient.

32 Consider the branching chain described in Example 14. If a given man has two boys and one girl, what is the probability that his male line will continue forever?

33 Consider a branching chain with f(O) = f(3) = 1 /2. Find the probability P of extinction.

34 Consider a branching chain with f(x) =: p(1 - p)X, x > 0, ,¥here o 1 /2 and that P = p/( 1 - p) if p < 1/2.

36 Let Xn, n > 0, be a branching chain. Show that Ex(Xn) = Xjln . Hint : See Exercise 1 3 .

36 Let Xn, n > 0, be a branching chain and suppose that the associated random variable � has finite variance (J2 . (a) Show that

E[Xn2+ 1 I Xn = x] = XC12 + X2Jl2.

(b) Use Exercise 35 to show that

Ex(Xn2+ 1) = XJl"U1 + 112 Ex(X;). HiNt : Use the formula EY = Lx P(X = x)E[Y I X :::: x] .

46 Marko v Chains

(c) Show that

Ex(X;) = xcr2(Jln- l + · · · + /12(n -- l ») + X2/12n, n > 1 .

(d) Show that if there are x particles initially, then for n > 1

{ ( 1 _ ") xcr2/1n- l -�

Var �(n = 1 -, /1 '

nxcr2,

37 Consider the queuing chain.

Jl "# 1 ,

Jl = 1 .

(a) Show that if either J(O) = 0 or J(O) + J( I) = 1 , the chain is not irreducible.

(b) Show that ifJ(O) > 0 andJ(O) + J(I) < 1, the chain is irreducible. Hint : First verify that (i) Pxy > 0 for 0 < y < x; and (ii) if xo > 2 and J(xo) > 0, then PO,xo +n(xo - l ) > 0 for n > O.

38 Determine which states of the queuing chain are absorbing, which are recurrent, and which are transient, when the chain is not irreducible. Consider the following four cases separate:ly (see Exercise 37) : (a) J(I) = 1 ; (b) J(O) > O, J(I ) > 0, and J(O) + J(I) = 1 ; (c) f(O) = 1 ; (d) f(O) = 0 and J( l) < 1 .

391 Consider the queuing chain. (a) Show that for y > 2 and m a positive integer

m - l Py(To = m) = L Py(TY- 1 = k)Py- 1 (To = m - k).

k = 1

(b) By summing the equation in (a) on m = 1 , 2, . . . , show that

PyO = Py, y- 1 Py- 1 ,0 Y > 2.

(c) Why does Equation (76) follow from (b) ? (d) By summing the equation in (a) on m = 1 , 2, . . . , n, show that

Py(To < n) �; Py(TY- 1 < n)Py- 1 (1ro < n), (e) Why does Equation (79) follow from (d) ?

40 Verify that (8 1) follows from (80) by induction.

y > 2.

2 Stationary

Distributions of a Markov Chain

Let Xm 17 > 0, be a Markov chain having sta.te space f/ and transition function

P. If n(x), x E f/, are nonnegative numbers su:mming to one, and if

(1) � n(x)P(x, y) = n( y), Y E f/, x

then n is called a stationary distribution. Suppose that a stationary distribution n

exists and that

(2) lim pn(x, y) = n( y), Y E f/. n-+ 00

Then, as v..re will soon see, regardless of the initial distribution of the chain, the

distribution of Xn approaches n as n -+ 00 . In such cases, n is sometimes called the

steady state distribution.

In this chapter we will determine which Markov chains have stationary distribu

tions, when there is such a unique distribution, and when (2) holds.

2. '1 . Elementa ry pro perties of st:ati o n a ry d i stri buti o ns

Let n be a stationary distribution. I'hen

� n(x)p2(x, y) = � n(x) � P(x, z)P(z, y) x x z

= � (� n(x)P(x, z ») P(z, y)

= � n(z )P(z, y) = n(y) . z

Sirnilarly by induction based on the formula

pn+ lex, y) = � pn(X, Z )P(Z, y), z

we: conclude that for all n

(3) � n(x)pn(x, y) = n( y), Y E ff. x

47

48 Stationllry Dist'riblltions of • Markov Chain

If Xo has the stationary distribution TC for its initial distribution., then (3) implies that for all n

(4) P(Xn = y) = TC(y), Y E f/,

and hence that the distribution of Xn is independent of n. Suppos�e convt:�rsely that the distribution of Xli is independent of n. Then the initial distribution TCo is such that

TCO(Y) = P(X 0 = y) = P(X 1 = y) = � TCo(X)P(x, y) . x

Consequently TCo is a stationary distribution. In summary, the distribution of Xli is independent of n if and only if the initial distribution is a stationary distribution.

Suppose now that TC is a stationary distribution and that (2) holds. L�et TCo be the initial distribution. Then

(5) P(XlI = y) = � TCO(X)PlI(x, y), Y E f/. x

By using (2) and the bounded convergence theorem stated in Section 2.5, wle can let n -+ 00 in (5), obtaining

limL P(XlI = y) = � TCo(X)TC(Y). n -' 00 x

Since Lx TCo(X) = 1 , we: conclude that

(6) lim ,P(Xn = y) = TC(y), Y E f/. 11 -' 00

Formula (6) states that, regardless of the initial distribution, for large values of n the distribution of Xli is approximately equal to the stationary distribution TC. It implit:�s that TC is the unique stationary distribution. For if there were some oth��r stationary distribution we could use it f;or the initial distribution TCo. 1From (4) and (6) we \vould conclude that TCo(Y) = TC(y), Y E f/.

Consider a system described by a Markov chain having transition function P and unique stationary distribution TC. Suppose we start observing the system after it has been going on for some time, say no units of time for some large positive integer no. In ��ffect, we observe Y,., n > 0,

where

n �: o. The random variables ��, n > 0, also form a �,1arkov chain with transition function P. In order to determine unique probabilities for events defined in terms of the Yn chain, we need to know its initial distribution, which is the same as the distribution of Xno. In most practical applications it ilS very

2. 2. Examples 49

hard to determine this distribution exactly. We may have no choice but to assume that Ym n > 0, has the stationary distribution n for its initial distribution. This is a reasonable assumption if (2) holds and no is large.

2.2: . Examples

In this section we will consider some examples in which we can show dir(�ctly that a unique stationary distribution exists and find simple fonnulas for it.

In Section 1 . 1 we discussed the two-state Markov chain on f/ = {O, I } having transition matrix

o 1 o [ 1 -

p p ] 1 q 1 - q .

We saw that if p

+ q > 01, the chain has a uniqUle stationary distribution n, detc�rmined by

nCO) = _L p + q and n(t) = p

p + q We also saw that if 0 < J' + q < 2, then (2) holds.

F'or Markov chains having a finite number of states, stationary distributions can be found by solving a finite system of linear equations.

E:xample 1 . Consider a Markov chain having state space tl' = {O, 1 , 2} and transition matrix

0 1 2

� [t t t] . 2 i t t

Show that this chain has a unique stationary distribution n and find 1l:.

F'ormula ( 1) in this casc� gives us the three equations

n«(� + n(t) + n(2) = nCO)

3 4 6 '

n«(� + n( 1) + n(2) = n(t)

3 2 3 '

n(C� + n(t) + n(2) = n(2).

3 4 2

50 Stationary Dist" ibutions of a Marko v Chain

Lx n(x) = 1 gives us the fourth equation

nCO) + n(I) + n(2) == 1 .

By subtracting twice the first equation frorn the second equation, we eliminate the term involving n(2) and find that n(I) = 5n(O)/J. We conclude from the first equation that n(2) == 3n(O)/2. From the fourth equation we now see that

and hence that

Thus

and

n(O)(I + t + !) = 1 ,

nCO) = 265 •

'fT(I) - � · � - �-"" - 3 2 5 - 5

'fT(2) - 3 • 6 - 9 I " - "2 TI - 2 5�·

It is readily seen that these numbers satisfy an four equations. Sincle they are nonnegative, the unique stationary distribution is given by

nCO) = 265 ' n( l) = j-, and n(2) = 295 .

Though it is not easy to see directly, (2) holds for this chain (see Slection 2.7).

2. 2.1 . Birth and de4C1th cha in . Consider a birth and death chain on {O, 1 , . . . , d} or on the: nonnegative integers. In the latter case ,we set d = 00 . We assume without further mention that the chain is irreducible, i .e: . , that

Px > 0 for O < x < d and

for

if d is finite, and that

Px > 0 for

and

for O < x < oo if d is infinite .

Suppose d is infinite . The system of equations

� n(x)P(x, y) = n( y), Y E f/, x

2.2. Exam,pies

becomes

n(O)r 0 + n(l)q 1 = n(O) ,

n(y - I)Py- l + 1r(y)ry + n(y + l)qy+ 1 = n(y),

Since Py + qy + ry = 1 ,

the:se equations reduce to

qln(l) - .pon(O) = 0, (7)

qy+ ln(y + 1) - .Py1r(Y) = qyn(y) - Py- ln(y - 1),

It follows easily from (7) and induction that

and hence that

Consequently,

(8)

Set

· (9)

Qy+ ln(y + 1) - pyn(y) = 0,

n( y -1- 1) = J!L n( y), qy + l

n(x) == Po · · · Px- l nCO), q 1 • • • qx

y > 0,

y > o.

x > 1.

x = 0,

x > 1.

Th�en (8) can be written as

( 10) x > 0.

Conversely, ( 1) follows from (10). Suppose now that Lx 1rx < 00 or, equivalently, that

(11) f Po · · · Px- l < 00 . x= l Q l · · · qx

51

y > 1 .

y > 1 .

We conclude from (10) that the birth and death chain has a unique stationary distribution, given by

(12) n(x) = nx , L:'=o ny

x ;� o.

Suppose instead that (1 1 ) fails to hold, i .e. , that

(13) 00

� nx = 00. x= o

52 Stationary Dis�'ributions of a Marko v Chain

-v,re conclude from (10) and (1 3) that any solution to ( 1 ) is either identically z(�ro or has infinite sum:, and hence that there is no stationary distribution.

In summary, we see that the chain has a stationary distribution if and only if ( 1 1) holds, and that the stationary distribution, when it exists, is given by (9) and ( 12).

Suppose now that d < 00 . By essentially the same arguments used to obtain ( 12), we conclud�� that the unique stationary distribution is given by

(1.4) n(x') = nx , �d '

�y= O ny

w'here nx; 0 < x < d, is given by (9) .

o �: x < d,

Example 2. Consider the Ehrenfest chailn introduced in Section 1 .3 and suppose that d = 3 . Find the stationary distribution.

The transition matrix of the chain is

0 1 2 3

0 0 1 0 0

1 t 0 i 0

2 0 t 0 t 3 0 0 1 0

This is an irreducible birth and death chain in which no = 1 ,

1. ttl = t = 3,

and

Thus the unique stationary distribution is given by

n(O) = i, �rr(l) = i, n(2) = 1�, and n(3) = i. Formula (2) does not hold for the chain in I�xample 2 since pn(x, J(,) = 0

for odd values of n. WC:� can modify the Ehrenfest chain slightly and avoid such "periodic" behavior.

Example 3. Modified Ehrenfest cha i n . Suppose we have two boxes labeled 1 and 2 and d balls labeled 1 , 2, . . . , tl. Initially some of thle balls are in box 1 and the r,emainder are in box 2. An integer is selec:ted at random from 1 , 2, . . . , d, and the ball labeled by that integer is re1moved from its box. We now select at random one of the two boxes and put the re�moved ball into this box. The procedure is repeated indefinitely, the

2.2. EXBmp/es S3

sele:ctions being made independently. Let Xn dlenote the number of baHs in box 1 after the nth trial . Then Xm n > 0, is a Markov chaift on �'f' = {O, 1 , . . . , d} . Find the stationary distribution of the chain for d = 3.

l�he transition matrix of this chain, for d = 3, ii

o 1 2 3

o t t 0 0

1 ! t t 0

2 0 t t ! 3 0 0 t t

To see why P is given as indicated, we will COllipute P(l , y), 0 < y < 3. We: start with one ball in box 1 and two balls in. box 2. Th.us ?(1 , 0) is the probability that the ball selected ii fr01l1 box 1 aDd the box select�ed is box 2. Thus

P(I , 0) ::c t · t = i. Secondly, P(I , 2) is the probability tltat the ball Hlected is lrom box 2 and the box selected is box 1 ., Thus

P(I , 2) = i · t = t· Clearly P(I , 3) = 0, sinc�� at most one ball is traLnsferred at a time. Finally, P(I , 1) can be obtained by subtracting P(I , 0) + P(I , 2) + P(I , 3) from 1 . Alternatively, P(I , 1) is the probability that either the selected bllIl is frolm box 1 and the selected box is box 1 or the selected ball is from box 2 and the selected box is box 2. Thus

P(I , 1) = t · t + t · t == t · Th�� other probabilities are computed similarly.. This Markov chain :is an irre�ducible birth and death chain. It is easily seen that 1tx, 0 < X < 3, are the same as in the previous example and henc�� that the stationary distribution is again given by

1t(0) = -1, 1t(I) = i, n(2) = f, and n(3) = t. It follows from the results in Section 2.7 that (2) holds for the chain in

EX�lmple 3 .

2.2: .2. P.rticl •• in a b1ox. A Markov chain that arises in several applied contexts can be described as follows. Suppose that �n particles are added to a box at times n = 1 , 2, . . . , where <;m n > 1 , are independent aBd have a Poisso� distribution with common parameter A. Suppose that lac,lt particle iR the box �Lt time If, iudepeBdently of all the other par1ticlei

54 Stationary Distributions of a Marko v' Chain

in the box and independently of how particl(�s are added to the box, has probability p < 1 of remaining in the box at time n + 1 and probability q = 1 - p of being rernoved from the box at time n + 1 . Let Xn denote the number of particles in the box at time n. 'Then Xm n > 0, is a Jv1[arkov chain. We will find the stationary distribution of this chain. We will also find an explicit formula for pn(x, y) and use this formula to show directly that (2) holds.

The same Markov chain can be used to describe a telephone exchange, where �n is the number of new calls starting at time n, q is the probability that a call in progress at time n terminates by time n + 1 , and Xn is the number of calls in progress at time n.

We will now analyze this Markov chain. l.let R(Xn) denote the number of particles present at time n that remain in the box at time n + 1 . Then

Xn+ 1 = �n+ 1 + R(}{n) · Clearly

P(R(Xn} = z I Xn = x} = G) p"(1 - PY-", o < z < x,

and

Since

AZ - l P( � = z) =

e �n , ' z .

min(x,y)

z > O.

P(Xn + 1 = Y I Xn = x) = � P(R(Xn) = Z, �n+ l = Y - z I Xn == x) z = O

min(x,y) � P(�n+ 1 = Y - z)P(R(Xn) = z I Xn = x), z = O

we conclude that

(15) min(x,y) Ay- ze- l (x) P(x, y) =: � pZ(l _ p)X-z. z = o ( y - z) ! z

It follows from (1 5) or from the original description of the process that P(x, y) > 0 for all x > 10 and y > 0, and henc(� that the chain is irreducible.

Suppose Xn has a Poisson distribution with parameter t. Then R(Xn) has a Poisson distribution with parameter pt. For

00

P(R(Xn) = y) = � P(Xn = x, R(Jrn) = y) x=y 00

= � P(Xn = x)P(R(Xn) = y I Xn = x) x=y

= t tXe- t (x) pY(l _ p)X-y x= y x ! Y

_ f tXe- t p)/(l - PY-Y x= y y ! (x - y) !

= ( pt»)/e- t f (t(l - p)Y-Y y ! X=Y (x - y) !

= ( ptYe- t f (t(l - p»'

y ! z = O z ! ( )y - t

= pt e et( 1 - p)

y !

= ( pt)Ye- pt

y ! whi1ch shows that R(Xn) has the indicated Poisson distribution.

55

Vve will now show that the stationary distribution is Poisson 'with parameter t for suitable 1.. Let Xo have such a distribution. Then X"l =

e I -t- R(Xo) is the sum of independent random� variables having Poisson distributions with paramleters A and pt resp(�ctively. Thus Xl has a Poisson distribution with parameter A + pt. l'he distribution of Xl will agr(�e with that of Xo if t = A + pt, i .e. , if

., A

t = -- - -1 - p q

We conclude that the Markov chain has a stationary distribution TC which is a Poisson distribution v�ith parameter A/q, i .e .. , such that

(16) (A/q)X e - ).jq n(x) = , x !

x > o.

Finally we will derive a formula for pn(x, y) . Suppose Xo has a Poisson distribution with parameter t. It is left as an exercise for the reader to sh01w that Xn has a Poisson distribution with parameter

A tpn + _ (1 _ pn). q Thus

and hence

(17) 00 [ tp'l + � (1 - pn)] Y � r pn(x, y) = e- A( l - p")/Ql( 1 - P") ,

__ q ___ _

x= o x ! y !

56 Stationary Distributions of a Markov Chain

Niow if

where each power series has a positive radius of convergence, then x

ex = }2 azbx_z · z= o

If az = 0 for z > y, th(�n min(x,y)

ex = � azbx_ z · z= o

U'sing this with ( 17) and the binomial expansion, we conclude that

. x ! e- ),( 1 - prl)/q min(x,y) (Y) [A ] y- z (1 _ p")X-Z jP"(x, y) = � p"z

_ (1 _ p") , , y ! z = 0 Z q (x - Z) !

which simplifies slightly to [A ] y - Z

min(x,y) ( ) - (1 - p") (18) P"(x, y) = e - ).( 1 - p")/q � x p"Z(l _ p")X-Z q .

z = o Z ( y - Z) !

Since 0 < p < 1 , lim p" = o.

" -' 00

Thus as n --+ 00 , the terms in the sum in ( 1 8) all approach zero except for the term corresponding to z = O. We conclude that

(19)

_ )./q (A)Y lim P"(x, y) =

e q = 1t( y), y !

x, Y > O.

Thus (2) holds for this chain, and consequently the distribution 1t given by ( 1 6) is the unique stationary distribution of the chain.

2.3. Average number of visits to a rac:urrent state

Consider an irreducible birth and death chain with stationary distribution 1t. Suppose that P(x, x) = rx = 0, x E tl', as in the Ehrenfest chain and the gambler's ruin chain. Then at each transition the birth and death chain moves either one step to the right or one step to the left. Thus the chain ca.n return to its starting point only after an leven number of transitions. In other words, P"(x, x) = 0 for odd values of n. For such a chain the formula

lim P"(x, y) = 1t( y), y E f/,

cl(�arly fails to hold.

2. 3. A ver�,ge number of visits tc, B recurrent state 57

'There is a way to handle such situations. L,et an' n > 0, be a sequence of numbers. If

(20) lim a" = L n-+oo

for some finite number J�, then

(21l) lim 1 t am = L. n-+ oo n m= l

Formula (2 1) can hold, however, even if (20) fails to hold. For exa:mple, if a" = 0 for n odd and an = 1 for n even, then an has no limit as n --+ 00 ,

but

. 1 n 1 bm - � am = - ..

" -+ 00 n m= l 2 In this section we will show that

1 " lim - � pm(x, y) n-+ oo n m= 1

exilsts for every pair x, y of states for an arbitrary Markov chain. In Se�ction 2.5 we will use the existence of these limits to determine ,�hich Markov chains have stationary distributions and when there is such a unique distribution.

]Recall that

and that

(22) Set

and

l ,(z) = {�:

n Nn( y) = �

m= l

n (]n(x, y) = �

m= l

z = y, z #: y,

ly(Xm)

pm(x, y).

Then N,,(y) denotes the number of visits of the: Markov chain to y during tinles m = 1 , . . . , n. The expected number of such visits for a chain starting at x is given according to (22) by

(23) Ex(N,,(y)) = G,,(x, y).

]Let y be a transient state. Then

lim N�(y) = N(y) < 00 ,,-+ 00

with probability one,

58 Stationary Dist,ributions of a Markov Chain

and

x e f/. n-' oo It follows that

(24)

and that

(25)

lim Nn(y) = 0 with probability one, n-' 00 n

1· Gn(x, y) 0 11m = , n-l· 00 n x e f/.

Observe that Nn(y)/n is the proportion of the first n units of time that the chain is in state y and that G n(x, y)/n is the expected value of this proportion for a chain starting at x.

Suppose now that y is a recurrent state. I�et my = Ey(Ty) denote the mean return time to y for a chain starting at y if this return time has finite expectation, and set my = 00 otherwise. Let l {Ty < oo} denote the random variable that is 1 if Ty <: 00 and 0 if Ty = 00 .

We will use the strong law of large numbers to prove the main result of this section, namely, Theorem 1 below.

Strong Law of Large Num bers. Let C; 1 , C;2 ' . . . be independent identically distributear random variables. �r these random variables have finite mean J.l, then

. C; + ' ." + C; lIm 1 n = J.l

n -' 00 n with l'Jrobability one.

If these random variables are nonnegative and fail to have finite expectation, then this limit holds, provided that we set J.l = + 00 .

This important theorem is proved in advanced probability texts.

(26)

and

(27)

Theorem 1 Let y be a recurrent state. Then

lim Nn(y) = l {Ty< oo} n-' oo n my


x e f/.

These formulas are intuitively very reasonable. Once a chain reaches y, it returns to y "on the average every my units of time." Thus if Ty < 00 and n is large, the proportion of the first n units of time that the chain is in

2.3. A ver.,ge number of visits t() a recurrent state 59

state y should be about l imy. Formula (27) should follow from (26) by taking expectations.

From Corollary 1 of <:hapter 1 and the above theorem, we immediately obtain the next result.

Corollary 1 Let C be an irreducible closed set of recurrent states.

Then

(28) lim Gn(x, y) = � n-' oo n my

, x, Y E C,

and if P(Xo E C) = 1 , then with probability one

(29) liJn Nn(y) = _1 ,

n-" oo n my Y E C.

If my = 00 the right sides of (26)-(29) all equal zero, and hence (24) and (25) hold.

Proof In order to verify Theorem 1 , we need to introduce some additional random variables. Consider a Nlarkov chain starting at a recurrent state y. With probability one it r(�turns to y infinitely many til1nes. For r > 1 let T; denote the time of the� rth visit to y, so that

T; = min (n > 1 : Nn( y) = r) .

Set W; = Tyl = Ty and for r > 2 let W; = T; - T;- I denot'e the waiting time between the (r - l)th visit to y and the rth visit to y. Clearly

T; = W; + . . . + JV;. The random variables W! , W:, . . . are independent and identically

distributed and hence they have common mean Ey(W!) = Ey(Ty) := my. This result should be intuitively obvious, since every time the chain r(�turns to y it behaves from then on just as would a chain starting out initially at y. One can give a rigorous proof of this result by using (27) of Chapter 1 to show that for r > 1

and then showing by induction that

Py(W; = mI , . . . , HI; = mr) = Py(W: = mI) · · · Py(W: = mr). The strong law of large numbers implies that

W I + W 2 + . . . + W k

lim y y y = m k

y k-. oo

'with probability one,

60 Stationary Distributions of a Marko v Chain

i .e: . , that

(30) T k lim -1 = m k-+ ex> k

y with proba.bility one.

Set r = N,.(y). By time n the chain has made exactly r visits to y. Thus thle rth visit to y occurs on or before time n, and the (r + l)th visit to y occurs after time n ; that is,

T Nn(Y) s: n < T Nn(Y) + 1 Y Y '

and hence

or at least these results hold for n large enough so that N,.(y) � 1 . Since N,.(y) --+ 00 with probability one as n --+ 00, these inequalities and (30) together imply that

I. n 1m = my ,.-+ ex> N,.( y)

or, equivalently, that (29) holds.


Let y be a recurrent state as before, but let X� have an arbitrary distribution. Then the chain nlay never reach y. If it does reach y, hov�ever, the above argument is valid ; and hence, with probability one, N,.(y)ln --+ 1 {Ty

< 00 }/my as n --+ 00. Thus (26) is valid. By definition 0 < N,.{y) < n, and hence

(3 1) o < N,.(y) < 1 . n

A theorem from measure theory, known as the dominated convergence theorem, allows us to conclude from (26) and (3 1) that

lim Ex (N,.( y)) = Ex (I {TY< ex>}) = Px(Ty < 00) = Pxy ,.-+ ex> n my my my

and hence from (23) that (27) holds. Theorem 1 .

This completes the proof of

2.4. N u l l recurrent and positive recu rrent states

I

,A recurrent state y is called null recurrent if my = 00 . From Theorem 1 we� see that if y is null r(�current, then

(32) I " G,.(x, y) I " L:!.= 1 pm(X, y) 0 1m = lm = , ,. -+ ex> n ,.-+ ex> n

X E [/'.

2.4. Null jrecurrent and positive recurrent states

(It can be shown that if y is null recurrent, then

(33) liIn P"(x, y) = 0, X E f/, n-+ oo

61

which is a stronger result than (32). We will not prove (33), since it will not be needed later and its proof is rather difficult.)

A recurrent state y is called positive recurrent if my < 00. It follows fr()m Theorem 1 that if y is positive recurrent, then

lim Gn( y, y) = _1 > o.

n my

Thus (32) and (33) fail to hold for positive recurrent states. Consider a Markov chain starting out in a recurrent state y. It follows

fr()m Theorem 1 that if y is null recurrent, then, with probability one, the proportion of time the chain is in state y during the first n units of til1ne approaches zero as n -+ 00 . On the other hand, if y is a positive recurrent state, then, with probability one, the proportion of tim.e the chain is in state y during the first n units of time approaches the positive linrrit l imy as n -+ 00 .

'The next result is closely related to Theorel1n 2 of Chapter 1 .

Theorem 2 ��t .x be a positive recurrent state and suppose that x leads to y. Then y is posiiw·e recurrent .

. Proof. It follows from Theorem 2 of Chapter 1 that y leads to x. Thus there exist positive integers n1 and n2 such that

and

Now pnl +m+n2( y, y) > pnl( y, x)pm(x, x)pn2(x, y),

and by summing on m == 1 , 2, . . . , n and dividing by n, we conclud�e that

Gnl + n +n2(Y' y) _

Gnl +II ,( Y' y) > pnl( y, x)pn2(x, y) Gn(x, x) .

n n n

As n -+ 00, the left side of this inequality converges to l imy and the right side converges to

� >

pnl( y, x)pn2(x, y) > 0 - , my mx

and consequently my < 00 . This shows that y is positive recurrent. I


From this theorem and from Theorem 2 of Chapter 1 we see that if C is an irreducible closed set, then every state in C is transient, every state in C is null recurrent, or every state in C is positive recurrent. A Markov chain is called a null recurrent chain if all its states are null recurrent and a positive recurrent chain if all its states are positive recurrent. We see th,erefore that an irreducible Markov chain is a transient chain, a null re�current chain, or a positive recurrent chain.

If C is a finite closed set of states, then C has at least one positive recurrent state. For

}: pm(x, y) = 1 , X E C, y E: C

and by summing on m = 1 , . . . , n and dividing by n we find that

}: Gn(x, y) = 1 ,

y e C n X E C.

If C is finite and each state in C is transient or null recurrent, then (25) holds and hence

a contradiction.

1 = lim }: Gn(x, y)

n-+ oo y e C n

= }: lim Gix, y) = 0, y e C n-+ oo n

We are now able to sharpen Theorem 3 of Chapter 1 .

Theorem 3 Let C be a finite irreducible closed set of states. Then every state in C is positive recurrent.

Proof. The proof of this theorem is now almost immediate. Since C is a finite closed set, there is at least one positive recurrent state in C. Since C is irreducible, ev�ery state in C is positiv,e recurrent by Theorenl 2. I

Corollary 2 An irreducible Markov chain having a finite number of states is positive recurrent.

Corollary 3 A Af arkov chain having a )7nite number of states has no null recurrent states.

Proof. Corollary 2 follows immediately from Theorem 3 . To verify Corollary 3, observe that if y is a recurrent state, then, by Theorem 4 of Chapter 1 , y is contained in an irreducible closed set C of recurrent states. Since C is necessarily finite, it follows from Th'eorem 3 that all states in C, including y itself, are positive recurrent. Thus every recurrent state is positive recurrent, and hence there are no null recurrent states. I

2. 5. Existence and uniqueness of stationary distributions 63

Example 4. Consider the Markov chain described in Example 10 of Chapter 1 . We have seen that 1 and 2 are transient states and that 01, 3, 4, and 5 are recurrent states. We now see that these recurrent states are necessarily positive recurrent.

2. !5 . Existence and un iqueness of stationary distri bution!5

In this section we will determine which Markov chains have stationary distributions and when there is a unique such distribution. In our discussion we will need to interchange summations and limits on several occasions. This is justified by the following standard elementary result in analysis, which we state without proof.

Bounded Convergence Theorem . Let a(x), x E !/, be non-negative numbers havlng finite SUn-l, and let bn(x), x E !/ and Il > 1 , be such that Ibn(x) I < 1 , x E !/ and n > 1 , and

lim bn(x) = b(x), X E !/. n� oo

Then

lim � a(x)bn(x) = � a(x)b(x) . n� cx:) x x

Let n be a stationary distribution and let m be a positive integer. Then by (3)

� n(z)pm(z, x) = n(x) . z

Summing this equation on m = 1 , 2, . . . , n and dividing by n, we conclude that

(34) G (z x) � n(z) n , = n(x),

z n X E fI'.

Theorem 4 Let ,roc be a stationary distribution. If x is a transient state or a null recurrent state, then n(x) = o .

. Proof If x is a transient state or a null recurrent state,

(35) 1· Gn(z, x) 0 Inl = , n� oo n

X E fI',

as shown in Sections 2.3 and 2.4. It follows from (34), (35), and the bounded convergence theorem that

as desired.

( ) 1· '" ( ) Gn(z, x) 0 n x = 1m i.J n z = , n�cx> z n

I


It follows from this theorem that a Markov chain with no positive re1current states does not have a stationary distribution.

Theorem 5 An irreducible positive recurrent Markov chain has a unique stationary distribution n, given by

(36) 1 n(x) = - ,

mx X E t/.

Proof It follows from Theorem 1 and the assumptions of this theorem that

(37) lim Gn(z, x) = � n-+ ocl n mx

, x, Z E f/.

Suppose n is a stationary distribution. We see from (34), (37), and the bounded convergence theorem that

. G fz x) n(x) = hm � n(z) 2-�_' _

n-+ oo z n

1 1 = - � n(z) = - .

mx z mx

Thus if there is a stationary distribution, it must be given by (36). To complete the proof of the theorem we need to show that the fUllction

?rex), x E f/, defined by (36) is indeed a stationary distribution. It is clearly nonnegative, so we need only show that

(3:8)

and

(39) 1 1

� -- P(x, y) = - , x m:x my

.y E f/.

Toward this end we observe first that

� P'"(z, x) = 1 . x

Summing on m = 1 , . . . , n and dividing by 1t�, we conclude that

(40) Z E f/.

N�ext we observe that by (24) of Chapter 1

� P'"(z, x)P(x, y) = P'"+ 1(Z, y). x

2. 5. Exists1nce and uniqueness o�r: stationary distributiolns 65

By again summing on m = 1 , . . . , n and dividing by n, we conclude� that

(41) I: GnCz, x) P(x, y) = Gn+ 1 (z, y) _

P(z, y) . x n n n

If f/ is finite, we conc1lude from (37) and (40) that

1 - 1 · � Gn(z, x) _ �I 1

- 1m i.J - i.., - ' n-' oo x n x mx

i .e. , that (38) holds. Silnilarly, we conclude that (39) holds by le:tting n -+ 00 in (41) . This com,pletes the proof of the theorem if f/ is finite ..

1rhe argument to complete the proof for f/ infinite is more complicated, sin4�e we cannot directly interchange limits and sums as we did for f/ jfinite (th�e bounded convergence theorem is not applicable). Let f/ 1 be a finite subset of f/. We see from (40) that

I: Gn(z, x) < 1, x e9' 1 n

Since [f'1 is finite, we can let n --+ 00 in this inequality and conclude from (37) that

Th(� last inequality holds for any finite subset �l) 1 of f/, and hence

(42)

For if the sum of l /mx over x E f/ exceeded 1 , the sum over some 1inite subset of f/ would also exceed 1 .

Similarly, we conclude from (4 1) that if f/ 1 is a finite subset of f/, then

� Gn(z, x) P( ) Gn+ l (Z, y) _ P(z, y) I.J x, y < ·

x efl'l n n n

By letting n --+ 00 in this inequality and using (37), we obtain

1 1 I: - P(x, y) < - ·

x efl'l mx my

We conclude, as in the proof of (42), that

(43) 1 1 I: - P(x, y) < - ,

x mx my Y E f/.

66 Stationary Dist,ributions of B Markov Chain

Next we will show that equality holds in (43). It follows from (42) that the sum on y of the right side of (43) is finite . If strict inequality he�ld for some y, it would follow by summing (43) on y that

I: � > r (I: � P(x, y)) y my y x mx

= I: � (I: P(x, y)) x mx y

1 = I: - ,

x mx which is a contradiction. This proves that equality holds in (43), i .e . , that (39) holds.

Set

Then by (39)

1 C = --I: �

x mx

C n(x) = - , mx

(0' X E c..7 ,

defines a stationary distribution. Thus by the lfirst part of the proof of this th�eorem

C 1

and hence c = 1 . This proves that (38) holds and completes the proof of thle theorem. I

From Theorems 4 and 5 we immediately obtain

Corollary 4 An irreducible Markov chain is positive recurrent if and only if it has a stationary distribution.

Example 5. Consider an irreducible birth and death chain on the nonnegative integers. l�ind necessary and sufficient conditions for the chain to be

(a) positive recurrent, (b) null recurrent, (c) transient.

:From Section 2.2. 1 wle see that the chain has a stationary distribution if and only if

(44) � Po · · · Px- l < i..J CI") .

x= 1 q 1 • • • qx

2.5. ExistE.nce and uniqueness 0;' stationary distributi(�ns 67

Thus (44) is necessary and sufficient for the chain to be positive recurrent. W(� saw in Section 1 .7 that

(45) 00

L q I · · · qx < 00 x= 1 PI • • • Px

is at necessary and sufficient condition for the chain to be transient. For the chain to be null recurrent, it is necessary and sufficient that (44) and (45) both fail to hold. Thus

(46) f q 1 - - - qx = 00 x= 1 PI · • • Px

and

are� necessary and sufficient conditions for the chain to be null recurrent. �A.s an immediate cons��quence of Corollary 2 and Theorem 5 we obtain

Corollary 5 If a Markov chain having a finite number of states is irreducible, it has a unique stationary distribution.

lR.ecall that Nn(x) denotes the number of visits to x during times m =

1 , . . . , n. By combining Corollary 1 and Theorem 5 we get

Corollary 6 Let Xm n > 0, be an irreducible positive recurrent .Markov chain having stationary distribution :TC. Then with probability ,one

(47) lim Nn(x)

= n(x) , n-' 00 11

X E f/.

2.fi .1 . Reduci ble ch�� ins. Let TC be a distribution on f/, i.e: . , let TC(.X), x E f/, be nonnegative numbers adding to one, and let C be a subset of f/. We say that TC is concentrated on C if

TC(X) = 0,

By essentially the same argument used to provc� Theorem 5 we can obtain a somewhat more general result.

Theorem 6 Let (� be an irreducible closed set of positive recurrent states. Then the Markov chain has a unique stationary distribution TC concentrated on C. It is given by

(48) TC(X) = {�x '

0,

X E C,

elsewhere.

68 Stationary Dist�ributions of a Markov Chain

Suppose Co and Cl are two distinct irreducible closed sets of positive rejcurrent states of a Markov chain. It follo�{s from Theorem 6 that the M:arkov chain has a stationary distribution 1to concentrated on Co and a different stationary distribution 1t 1 concentrated on C 1 . Moreover, the distributions 1t« defined for 0 < ex < 1 by

X E f/,

ar,e distinct stationary distributions (see Exercise 5). By combining Theore:ms 4-6 and their conslequences, we obtain

Corollary 7 Let � denote the positive recurrent states 0)" a Markov chain.

(i) If � is empty, the chain has no stationary distributions. (ii) If � is a nonempty irreducible set�, the chain has a unique

stationary distribution. (iii) If � is none�pty but not irreducible:, the chain has an infinite

number of distinct stationary distributions.

Consider now a Markov chain having a finite number of states. Then eVlery recurrent state is positive recurrent and there is at least one such state. There are two possibilities : either the set 9'R of recurrent states is irreducible and there is a unique stationary distribution, or 9'R can be decomposed into two or more irreducible closed sets and there is an in1crnite number of distinct stationary distribu1tions. The latter possibility holds for a Markov chain on fI' = {O, 1 , . . . , d} in which d > 0 and 0 and d are both absorbing states. The gambler's ruin chain on {O, 1 , . . . , d} and the genetics model in Example 7 of Chapter 1 are of this type., For su�ch a chain any distribution 1t«, 0 < ex < 1 , of the form

{ I - ex

n,,(x) = 0(, '

0,

is a stationary distribution.

X =: 0 , X =: d , else'where,

Exa mple 6. Consid'er the Markov chain introduced in Example 10 of Chapter 1 . Find the stationary distribution concentrated on ea�ch of th(� irreducible closed sets.

'We saw in Section 1 .6 that the set of recurrent states for this chain is de1composed into the absorbing state 0 and the irreducible closed set {3" 4, 5} . Clearly the unique stationary distribution concentrated on {O} is given by no = ( 1 , 0, 0, 0, 0, 0). To find the unique stationary distri-

2.6. Queuing chain 69

bution concentrated on {3, 4, 5 }, we must find nonnegative nUlnbers n(3), n( 4), and n( 5) sumlning to one and satisfying the three equations

n(3) 3

= n(4)

n(3) + n(4) + 3n(5) = tr(5). 2 2 4

From the first two of these equations we find that n(4) = n(3)/3 and n(5) = 8n(3)/3 . Thus

n(3)(1 + t + !) = 1 ,

from which we conclude that

n(3) = !, n(4) = l2 ' and n(5) = t.

Consequently

is the stationary distribution concentrated on {3, 4, 5 }.

2J;. Queui ng cha i n

�Consider the queuing chain introduced in Example 5 of Chapter 1 . Recall that the number of customers arriving in unit time has density f and mean It. Suppose that the chain is irreducible, which means that f(O) > 0 and f(O) + f( l) < 1 (see Exercise 37 of Chapter 1) . In Chapter 1 �ve saw that the chain is recurrent if It < 1 and transient if It > 1 . In Section 2.6. 1 we will show that in the recurrent case

(49) 1

mo = ·

1 -

It

It follows from (49) that if It < 1 , then mo < (X) and hence 0 is a positive recurrent state. Thus by irreducibility the chain is positive recurrent.. On the: other hand, if It = 1 , then mo = 00 and hence 0 is a null recurrent state. We conclude that the queuing chain is null recurrent in this case. Thlerefore an irreducible queuing chain is positive recurrent if It < 1 and null recurrent if It = 1 , and transient if It > 1 .

70 Stationary Distjributions of a Marko v Chain

*�� .6 .1 . Proof. We will now verify (49). We suppose throughout the proof of this result that J(O) > 0, J(O) + J(I) < 1 and J.l < I , so that the chain is irreducible and recurrent. Consider such a chain starting at the positive integer x. Then Tx- 1 denotes the time to go from state x to state x - I , and Ty- 1 - Ty�, 1 < y < x - I , denotes the time to go from state y to state y - 1 . Since the queuing chain goes at most one step to the left at a time, the M:arkov property insure:s that the random variables

are independent. These random variables ar(;� identically distributed ; for each of them is distributed as

min (n > 0 : �1 + · · · + �n = n - 1),

i .e: . , as the smallest positive integer n such that the number of customers served by time n is one more than the number of new customers arriving by time n.

Let G(t), 0 < t < 1 , denote the probability generation function of the tirne to go from state 1 to state O. Then

00 (50) IG(t) = � tnPl(To = n).

n = l

The probability generating function of the sum of independent nonnegative in1teger-valued random variables is the product of their resplective pro bability generating functions. If the chain starts at x, then

is the sum of x independent random variables each having probability generating function G(t) . Thus the probability generating function of To is (G(t))X ; that is,

00 (5 1) ( ;(t))X = � tnp x(To == n).

n = 1

We will now show that

(52) G(t) = t<l>(G(t)), o < t < 1 ,

wbere denotes the probability generating function of J. To verify (52) we: rewrite (50) as

00 00 G(t) = � tn + 1Pl(To == n + 1) = tP(1 , 0) ,+ t � tnPl(To = n .+ 1) .

n = O n = l

* l�his material is optional and can be omitted with no loss of continuity.

2. 6. Queuing chain 71

By using successively (29) of Chapter 1 , (5 1 ) of this chapter, and the formula P( I , y) = fey), y > 0, we find that

00 G(t) = tP(l , 0) + t L tn L P(I , Y)Py(To = n)

n = 1 y * O 00

= tP(l , 0) + t L pel , y) L� tnpy(To = n) y * O n= 1

= tP(l , 0) + t L pel , y)(G(t» Y y* O

= t [f(lO) + L f( Y)(G(t))Y] y * O

= ttI>(G(t» .

For 0 < t < 1 we can differentiate both sides of (52) and obtain

G '(t) = tI>(G(t») + tG '(t)tI>'(G(t» .

Solving for G' (t ) we find that

(53) G'(t) = tI>(G(t» 1 - ttI>'( G(t»

, o < t < 1 .

Now G(t) -+ 1 and tI>(t) -+ 1 as t -+ 1 and

00 lim tI>'(t) = lim L xf(x)tX- 1 t-. 1 t-. 1 x = 1

00 = L xf(x) = /l.

x= 1

By letting t -+ 1 in (53) we see that

(54)

By definition

lim G'(t) = 1 . t-. 1 1 - /l

00 G(t) = L P1(To = n)tn.

n= 1

But since P(I , x) = P(O, x), x > 0, it follows from (29) of Chapter 1 that the distribution of To for a queuing chain starting in state 1 is the same as that for a chain starting in state o. Consequently,

00 G(t) = L Po(To = n)f,

n = 1

72 Stationary Dis��ributions of a Markov Chain

and hence 00

lim G�'(t) = lim � nPo{To = n)t" - l t-' l t-. l "= 1

00 = � nPo{To = n)

" = 1 = Eo{To) = mo·

It now follows from (54) that (49) holds.

2.7. Convergence to the stationary diistribution

I

We have seen earlier in this chapter that if Xn, n > 0, is an irredlucible positive recurrent Markov chain having n as its stationary distrit.ution, then

lim ! t pm(x, y) = lim GnCx, y) = 1t( y), " -' 00 n m= l n-'oo n

In this section we will see when the stronger result

lim P"{x, y) = n{ y), x, Y E f/, n-' oo

holds and what happens when it fails to hold.

x, Y E ff.

The positive integer (1 is said to be a divisor of the positive integer n if n/d is an integer. If I is a nonempty set of positive integers, the greatest common divisor of I, denoted by g.c.d. I, is defined to be the largest integer d such that d is a divisor of every integer in I. It follows immediatelly that

1 < g.c.d. J < min (n : n E J). In particular, if 1 E I, then g.c.d. I = 1 . The greatest common divisor of the set of even positive integers is 2.

Let x be a state of a Markov chain such that P"{x, x) > 0 for some n > 1 , i .e. , such that p)(;X = Px{Tx < (0) > o. We define its perio(l dx by

dx =: g.c.d. {n > 1 : P"(x, x) > O} .

Then

1 < d� < min (n > 1 : P"(x, x) > 0).

If P(x, x) > 0, then dx = 1 . If x and y are two states, each of which leads to the other, then dx = dye

For let n1 and n2 be positive integers such that

and

2. 7. Conv�Jrgence to the stBtiont.ry distribution

pi +n2(x, x) > pnl(x, y)pn2( y, x) > 0,

and hence dx is a divisor of nl + n2 0 If pn(y, y) > 0, then

pnl +1I +n2(x, x) > pnl(x, y)pn( y, y)pn2( y, x) > 0,

73

so that dx is a divisor of nl + n + n2 0 Since dx is a divisor of nl -t- n2 , it lDust be a divisor of n. Thus dx is a divisor of all numbers in the set {n � 1 : pll(y, y) > O}. Since dy is the largest such divisor, we con4:lude tha.t dx < dye Similarly ely < dx, and hence dx = dye

��e have shown, in oth,er words, that the stat(�s in an irreducible Markov chain have common period d. We say that the 4:hain is periodic with period d if d > 1 and aperiodic if d = 1 . A simple sufficient condition for an irr(�ducible Markov chain to be aperiodic is that P(x, x) > 0 for some x E: Y. Since P(O, 0) = ,/(0) > 0 for an irreducible queuing chain, such a chain is necessarily aperiodic.

I:xample 7. Determine the period of an irreducible birth and death chain.

If some r x > 0, then }>(x, x) = r x > 0, and the birth and death chain is aperiodic. In particular, the modified Ehrenfest chain in Examp}(� 3 is ape�riodic.

Suppose r x = 0 for all x. Then in one transition the state of the c;hain changes either from an odd numbered state to an even numbered state or fro:m an even numbered state to an odd numbered state. In particular, a chain can return to its initial state only after an (�ven number of transitions. Thus the period of the chain is 2 or a mUltiple of 2. Since

p' 2(0, 0) = POql > 0,

we conclude that the chain is periodic with pc�riod 2. In particular, the Ehrenfest chain introduc:ed in Example 2 of (�hapter 1 is periodic with period 2.

Theorem 7 Let �r,., n > 0, be an irre(lucible positive recurrent .. Markov chain having stationary distribution no If the chain is aperiodlc,

(55) litn pn(x, y) = n( y) , x, Y E ff. n-' oo

If the chain is periodic lwith period d, then for each pair x, y of states in .[/ there is an integer f, 0 < f < d, such that pn(x, y) = 0 unle.ss rl = md + r for some nonnegative integer m�, and

(56) lim pmd+r(x, y) = dn( y) . m '''' oo


For an illustration of the second half of this theorem, consider an irreducible positive recurrent birth and death chain which is periodic with period 2. If y - x is even, then p2m+ l ex, y) = 0 for all m > 0 and

lim p2m(X, y) = 2n( y). m-' oo

If y - x is odd, then p.2m(x, y) = 0 for all rn > 1 and

lim p2m+ lex, y) = 2n(y). m-'oo

We will prove this theorem in an appendix to this chapter, which can be olmitted with no loss of continuity.

Example 8. Deterrnine the asymptotic behavior of the matrix pn for the transition matrix P

(a) from Example 3, (b) from Example 2.

(a) The transition matrix P from Example 3 corresponds to an aperiodic irreducible Markov chain on {O, 1 , 2, 3} having the stationary distribution given by

nCO) = t, n(l) = i, n(2) = i, It follows from Theorem 7 that for n large

t i i t t i i t t i i t t i i t

and n(3) = i.

(b) The transition matrix P from Example 2 corresponds to a periodic irreducible Markov chain on {O, 1 , 2, 3} having period 2 and the same stationary distribution as the chain in Exam.ple 3. From the discussion following the statement of Theorem 7, we conclude that for n large and even

! 0 i 0

pn ...:. 0 i 0 ! t 0 i 0 0 i 0 t

while for n large and odd

0 i 0 ! pn ...:. ! 0 i 0

0 i 0 � 4

t 0 i 0

2. 8. Prootr of convergence 75

APPEN DIX

2.1� . Proof of convelrgence

�w e will first prove l�heorem 7 in the ap��riodic case. Consid(!r an apc!riodic, irreducible, positive recurrent Markov chain having transition fUIlction P, state space fI', and stationary distribution Te. We will now verify that the conclusion of Theorem 7 holds for such a chain.

�Choose a E f/ and let .l be the set of positive integers defined by

[ = {n > 0 : pn(a, a) ::::> O} .

Then

(i) g.c.d. [ = 1 ; (ii) if m E l and n IE [, then m + n E I.

Property (ii) follows frol1n the inequality

pm+n(a, a) > pm(a, a)pn(a, a).

Properties (i) and (ii) irnply that there is a positive integer n 1 such that n E: [ for all n > n1 • For completeness we will prove this number theoretic result in Section 2.8 .2. Using this result we (�onclude that pn(a, a) > 0 for n � n1 •

1Let x and y be any pair of states in f/. Since the chain is irredu�cible, the:re exist positive integ(�rs n2 and n3 such that

pn2(X, a) > 0

Then for n > nl

and

pn2+n+n3(x, y) > pn2(x, a)pn(a, a)pn3(a, y) > o.

W(! have shown, in other words, that for every pair x, y of states in f/ there is a positive integer no such that

(57)

Set

f/2 == {(x, y) : x E f/ and y E fI'} .

ThIen f/2 is the set of ordered pairs of elements in f/ . We will consider a Markov chain (Xn' Yn) having state space f/2 and transition function P2 defined by

It follows that Xn, n > 0, and Ym n > 0, are each Markov chains having transition functiop. P, and the successive transitions of the Xn chain and the: Yn chain are chosen independently of each other.

76 Stationary Dist4ributions of a Markov Chain

We will now develop properties of the lMarkov chain (Xn' 1';,.) . In particular, we will show that this chain is an aperiodic, irreducible, positive recurrent Markov chain. We will then use this chain to verify the conclusion of the theor{�m.

Choose (xo, Yo) E [/2 and (x, y) E f/2. By (57) there is an no > 0 such that

and

Then

(58)

W'e conclude from (58) that the chain is both irreducible and aperiodic. The distribution rc2 on fl'2 defined by rc.2(xo, Yo) = rc(xo)rc(Yo) is a

stationary distribution. For

� rc.2(xo, Yo)P .2« xo, Yo), (x, y)) (XO ,Yo) e f/2

= � � rc(xo)rc(Yo)P(xo, x)P(Yo, y) Xo e f/ yo e f/

= rc(x)rc(y) = 1t.2(x, y).

Thus the chain on fl'2 is positive recurrent ; in particular, it is recurr{�nt. Set

T' = min (n > 0 : Xn == Yn).

Choose a E f/. Since tbe (Xn' Yn) chain is recurrent,

1(a,a) == min (n > 0 : (Xm �!) = (a, a))

is jfinite with probability one. Clearly T < 1(a,a)' and hence T is finit�� with probability one.

For any n > 1 (regardless of the distribution of (Xo' Yo))

(59) P(Xn = y, T < n) = P(Yn = y, T < n), Y E fI'. This formula is intuitiv��ly reasonable since the two chains are indistinguishable for n > T. To make this argument precise, we choose 1 :� m < n. Then for Z E f/

(60) P(Xn = y I T = m, Xm = Ym = z) = P(Yn = y I T = m, Xm = Ym = z),

since both conditional probabilities equal p,n -m(z, y). Now the event {I' L n} is the union of the disjoint events

{T = m, Xm = Ym = z}, 1 < m < n and Z E f/,

2. 8. Proot� of convergence

so it follows from (60) and Exercise 4(d) of Chapter 1 that

P(Xn = Y I T < n) = P(Yn = y I T < n)

and hence that (59) holds. ]jquation (59) implies that

P(Xn = y) = P�(Xn = y, T < n) + }>(Xn = y, T > n) = p·(Yn = y, T < n) + P�(Xn = y, T > n) < p·(Yn = y) + peT > n)

and similarly that

P( Yn = y) < P(Xn = y) + }>(T > n).

Th,erefore for n > 1

(61) IP(Xn = y) - P( Yn = y) 1 < peT )� n), Y E ff.

Since T is finite with probability one,

(62) lim peT > n) = o. n-' oo

We: conclude from (61) and (62) that

(63) lim (P(Xn == y) - P( Yn = y)) = 0, y E ff.

77

lJsing (63), we can easily complete the proof of Theorem 7. Choose x E f/ and let the initial distribution of (Xn' Yn) be such that P(Xo = x) = 1 and

Yo E ff.

Since X", n > 0, and Yn, n > 0, are each Markov chains with transition function P, we see that

(64) P(Xn = y) = pn(x, y), Y E f/,

and

(65) P(}� = y) = n(y), Y IE ff.

Thus by (63)-(65)

lim (pn(x , y) - n( y)) = lim (P(Xn = y) - P(Yn = y)) = 0,

and hence the conclusion of Theorem 7 holds.

2.8.1 . Periodic case. We first conside�r a slight extension of Th(�orem 7 in the aperiodic case. Let C be an irreducible closed set of positive recurrent states such that each state in C has period 1 , and let 1t


bc� the unique stationary distribution concentrated on C. By looking at the M[arkov chain restricted to C, we conclude that

lim pn{x, y) = n{ y) = � , n-' oo my

x, y E C. In particular, if y is any positive recurrent state having period 1 , then by letting C be the irreducible closed set containing y, we see that

(66) lim pn{y, y) = � . n-' oo my

We now proceed with the proof of Theorem 7 in the periodic case. Lc�t Xm n > 0, be an irr1educible positive recurrent Markov chain which is p{�riodic with period d :::> 1 . Set Ym = Xmd, m > O. Then Ym, m > 0, is a M[arkov chain having transition function Q == pd. Choose y E f/. Then

g.c.d. {m I Qm{y, y) > O} = g.c.d. {nl I pmd{y, y) > O}

= ! g.c.d. {n I pn(y, y) > O} d

= 1 .

Thus all states have period 1 with respect to the Ym chain. Let the Xn chain and hence also the Ym chain start at y. Since the XII

chain first returns to y at some mUltiple of d, it follows that the expected return time to y for thle Ym chain is d - 1

my, where my is the expected return time to y for the Xn chain. In particu1lar, y is a positive recurrent state for a Markov chain having transition function Q. By applying (66) to this transition function we conclude that

and thus that

(67)

lim Qm(y, y) = � = dn( y), m-'oo my

lim .pmd{ y, y) = dn{y), m-'oo

Y E f/.

Let x and y be any pair of states in f/ and set

'1 = min (n : pn{x, y) :> 0) .

Then, in particular, prl(x, y) > o. We will show that pn(x, y) > 0 only if n - '1 is an integral 1nultiple of d.

Choose nl such that lr;)n1(y, x) > O. Then

prl + nl( y, y) > pnl (y, x)prl{x, y) > 0,

2. 8. Proof' of convergence 79

and hence '1 + n1 is an integral multiple of cl. If pn(x, y) > 0, th(�n by the: same argument n + n1 is an integral multiple of d, and therefore so is n - - '1 . Thus, n = kd +. '1 for some nonnegative integer k.

1rhere is a nonnegative integer m1 such that '1 = mId + " vvhere o ::::; , < d. We conclud,e that

(68) pn(x, y) = 0 unless n = md + ,

for some nonnegative integer m. It follows from (68) and from (28) of Chapter 1 that

m (69) pmd+r(x, y) == L Px(Ty = kd + r)p(m-k)d(y, y).

Set

k = O

{p(m- k)d(y y) am(k) = 0

" ,

o < k < m,

k > m.

Th�en by (67) for each fixed k

lim am(k) = dn( y). m-'oo

We: can apply the bounded convergence theorem (with f/ replaced by {O, 1 , 2, . . . }) to conclude from (69) that

00 lim pmd+ r(x, y) = dn(y) L Px{ 'Ty = kd + r) m-' oo k = O

= dn( Y)Px(Ty < ex))

= dn( y),

and hence that (56) holds. This completes the proof of Theorem 7. I

2.1t 2. A resu lt from number theory.

of positive integers such that (i) g.c.d. 1 = 1 ;

(ii) if m and n are in 1'1 then m + n is in I.

Let I be a nonempty set

Thc�n there is an no such that n E I for all n > no .

'�e will first prove that I contains two consecutive integers. Suppose otherwise. Then there :is an integer k > 2 and an n1 E I such that n1 + k E I and any two distinct integers in I differ by at least k. It follows frolm property (i) that thlere is an n E I such that k is not a divisor iQf n. We can write

n = mk + "

80 Stationary Dist�ributions of a Markov ,Chain

where m is a nonnegativ(� integer and 0 < r < k. It follows from property (ii) that (m + 1 )(nl + k) and n + {m + l )nl are each in I. Their di1[erence is

(m + 1)(nl + k) -- n - (In + l)nl = k + mk - n = k - r, which is positive and smlaller than k. This contradicts the definition of k.

We have shown that I contains two conse:cutive integers, say nl and nl + 1 . Let n > ni . Jlhen there are nonnegative integers m and ,. such that 0 < r < nl and

n - ni = mn l + Jr.

Thus

n = 4r(nl + 1) + (nl - r + m)nl '

which is in I by property (ii) . This shows that n E I for all

n > no = ni .

Exercises

I

1 Consider a Markov chain having state space {O, 1 , 2} and transition matrix

0 1 2 o [ .4 .4 .2] 1 . 3 .4 .3 It

2 .2 .4 .4

Show that this chain has a unique stationary distribution n and find n. 2 Consider a Markov chain having transition function P such that

P{x, y) = ay, x E ff and y E ff, where the ay's are constants. Show that the chain has a unique stationary distribution n, given by n(y) = exy, y E 9'.

3: Let n be a stationary distribution of a M[arkov chain. Show that if n{x) > 0 and x leads to y, then n{y) > O.

4 Let 1t be a stationary distribution of a Markov chain. Suppose that y and z are two states such that for some constant C

})(x, y) = cP{x, z),

Show that n{y) = c1r{z).

X E 9'.

5 Let no and 1t 1 be distinct stationary distributions for a Markov (�hain. (a) Show that for 0 < a < 1 , the function n« defined by

n«{x) = ( 1 - a)no(x) + all: l{x),

is a stationary distribution.

X E 9',

Exercises 81

(b) Show that distinct values of C( dett:�rmine distinct stationary distributions 1t(%. JHint : Choose Xo E f/ such that 1to(xo) =1= 1tl(XO) and show that 1t(%(xo) = 1tp(xo) implies that C( = p.

6 Consider a birth and death chain on the nonnegative integers and suppose thatpo = 1 , 1'Jx = P > 0 for x > 1 :, and qx = q = 1 - p' > 0 for x > 1 . Find the stationary distribution when it exists.

7 (a) Find the stationary distribution of the lEhrenfest chain. (b) Find the mean and variance of this distribution.

S For general d, find the transition function of the modified Ehrenfest chain introduced in E�ample 3, and show that this chain has the same stationary distribution as does the original l�hrenfest chain.

9 Find the stationary distribution of the birth and death chain described in Exercise 2 of Chapter 1 . Hint : Use the formula (d) 2 • • • (d) 2 = (2d)

() + + d d ·

1 0 Let Xm n > 0, be a positive recurrent irreducible birth and death chain, and suppose that Xo has the stationary distribution 1t. Show that

P(Xo = y I Xl = x) = P(x, y}, x, Y E f/.

Hint : Use the definition of 1tx given by (9).

1 1 Let Xm n > 0, be the :Markov chain introduced in Section 2.2.2. Show that if Xo has a Poisson distribution with parameter t, then Xn has a Poisson distribution �{ith parameter

1 2 Let Xm n � 0, be as in Exercise 1 1 . Show that

Hint : Use the result of Exercise 1 1 and equate coefficients of tX in the appropriate power series.

1 3 Let Xm n > 0, be as in Exercise 1 1 and suppose that Xo has the stationary distribution. Use th(� result of Exercise 1 2 to find cov (Xm' x,n+ .), m � 0 and n > o.

1 4 Consider a Markov ch.ain on the nonnegativc� integers having transition function P given by P(x, x + 1) = p and P(x, O) = 1 - p, wrhere o < p < 1 . Show that this chain has a unique stationary distribution 1t and find 1t.

82 Stationary Distjributions of a Markov Chain

1 5. The transition function of a Markov chain is called doubly stochastic if

� P(x, y) = 1 , y E f/. x ef/

What i s the stationary distribution of an irreducible Markov chain having d < 00 states and a doubly stochastic transition function ?

1 Ei Consider an irreducible Markov chain having finite state space f/, transition function .P such that P(x, x) := 0, X E f/ and stationary distribution n. Let Px, x E f/, be such that 0 < Px < 1 , and let Q(x, y), x E f/ and Y E f/, be defined by

and Q�(x, x) = 1 - Px

y #= x.

Show that Q is the transition function of a.n irreducible Markov chain having state space 9') and stationary distribution n', defined by

P - 1n(x) n'(x) = xl '

Ly ef/ p; n(y) X E f/.

The interpretation of the chain with tra.nsition function Q is that starting from x, it has probability 1 - Px of remaining in x and probability Px of jumping according to the transition function P.

1 7' Consider the Ehrenfest chain. Suppose that initially all of the balls are in the second box. Find the expected amount of time until the system returns to that state. Hint : Use th,e result of Exercise 7(a) .

1 S1 A particle moves according to a Markov (;hain on { I , 2, . . . , c .+ d}, where c and d are positive integers. Starting from any one of the first c states, the particle jumps in one transition to a state chosen uniformly from the last d states ; starting from any of the last d states, the particle jumps in one transition to a state chosen uniformly from the Jfirst c states.

(a) Show that the chain is irreducible. (b) Find the stationary distribution.

1 91 Consider a Markov chain having the transition matrix glve�n by Exercise 19 of Chapter 1 .

(a) Find the stationary distribution concentrated on each of the irreducible closed sets.

(b) Find limn-+ oo Gn(x, y)jn. 20 Consider a Markov chain having transition matrix as in Exercise 20

of Chapter 1 . (a) Find the stationary distribution concentrated on each of the� irre

ducible closed sets. (b) Find limn-+ oo Gn(x, y)jn.

Exercises

21 Let Xm n > 0, be the Ehrenfest chain with ,I = 4 and Xo = o. (a) Fjnd the approxirnate distribution of X� for n large and even. (b) Find the approxirnate distribution of X;ra for n large and odd.

22 Consider a Markov chain on {O, 1 , 2} having transition matrix

0 1 2

P = [ [! � �l . (a) Show that the chain is irreducible. (b) Find the period. (c) Find the stationary distribution.

83

23 Consider a Markov chain on {O, 1 , 2, 3, 4} having transition matrix

0 1 2 3 4

0 0 1 1- 0 0 3" 3 1 0 0 0 l. ! 4

P = 2 0 0 ° t ! . 3 1 0 0 0 0 4 1 0 0 0 0

(a) Show that the chain is irreducible. (b) Find the period. (c) Find the stationary distribution.

3 Markov

Pure Jump Processes

Consider again a system that at any time can be in one of a finite or countably

infinite set f/ of states. We call f/ the state space of the system. In Chapters 1 and

2 we studied the behavior of such systems at integer times. In this chapter we will

study the behavior of such systems over all times t > o.

3.1 . Cc)nstruction of j u m p processes

Consid,er a system starting in state Xo at time o. We suppose that the

system remains in state Xo until some positive time 't'l ' at which time the

system jumps to a new state Xl :F Xo . We allow the possibility that

the systenrl remains permanently in state xo, in which case we set 't'l = 00 .

If 't' l i s finite, upon reaching X l the system remains there until some time

't'2 > 't'l ,�hen it jumps to state X2 :F Xl . If the system never leaves Xl ' we set 't'2 = 00 . This procedure is repeated indefinitely. If some 't'm = 00 ,

we set 't'n = 00 for n > m.

Let X(t ) denote the state of the system at time t, defined by

Xo , o < t < 't'l '

(1) X(t) = X l ' 't'l < t < 't'2 ' X2 ' 't'2 < t < 't'3' . .

The proc��ss defined by ( I ) is called a jump process. At first glance it might

appear that (1) defines X(t ) for all t > O. But this is not necessarily

the case.

Consid�er, for example, a ball bouncing on the floor. Let the state of

the systenl be the number of bounces it has made. We make the physically

reasonable assumption that the time in seconds between the nth bounce

and the (n + I )th bounce is 2-n• Then xn = n and

't' = 1 + ! + · · · + _

1_ = 2 _ _

1_ . n 2 2n- l 2n- l

84

3. 1. Construction of jump procf.�sses 85

W,e see that Tn < 2 and 'tn � 2 as n � 00 . Thus (1) defines X(t) only for o :< t < 2. By the time t = 2 the ball will have made an infinite number of bounces. In this case it would be appropriate to define X(t) = eX) for t �� 2.

In general, if

(2) n-+ oo

we: say that the X(t) process explodes. If the X(t) process does not explode, i .e. , if

(3) lim Tn = 00 , n-+ oo

th�en (1) does define X(t) for all t > o. We will now specify a probability structur�e for such a jump process.

We suppose that all states are of one of two types, absorbing or nonabsorbing. Once the process reaches an absorbing state, it remains there permanently. With each non-absorbing state x, there is associated a distribution function Fx{it ), - 00 < t < 00, which vanishes for t < 0, and transition probabilities Q�xy, Y E f/, which are nonnegative and such that Qxx = 0 and

(4)

A process starting at x remains there for a random length of tirne T 1 having distribution function Fx and then jumps to state X(T1) = Y with probability QXY' Y E f/,. We assume that 'r 1 and X(T1) are chosen independently of each other, i.e. , that

Px(T1 < t, X(T1) = y) = }�(t)QXY.

H(�re, as in the previous chapters, we use thc� notation P x( ) and Ex( ) to denote probabilities of events and expectations of random variables de:fined in terms of a process initially in state x. Whenever and hO'iVever th(� process jumps to a state y, it acts just as a process starting initially at y. For example, if x and y are both non-absorbing states,

Px(T1 < s, X(T1) = Yll T2 - T1 < t, X(T2) == z) = Fx(s)QxyFy(t)(lyZ "

Sitnilar formulas hold for events defined in tefJms of three or more jumps. If .x is an absorbing stat(�, we set QXY = �xY' where { I Y = x,

� xy = 0: Y :F x.

Equation (4) now holds for all x E f/. 'We say that the jump process is pure or non-explosive if (3) holds with

probability one regardless of the starting point. Otherwise we say the

86 Marko v Pure Jump Processes

process is explosive. If the state space f/ is finite, the jump process is n(�cessarily non-explosive. It is easy to construct examples having an infinite state space which are explosive. Such processes, howeve:r, are unlikely to arise in practical applications. At any rate, to keep matters simple we assume that our process is non-explosive. The set of probability ze:ro where (3) fails to hold can safely be ignored. We see from (1) that X (t) is then defined for all t > o.

Let PXy(t) denote the probability that a process starting in state x will be in state y at time t. Then

and � Pxy(t) = 1 . y

In particular, P Xy(O) = �Xy. We can also choose the initial state x according to an initial distribution 1to(x), x E f/, where :7to(x) > 0 and

� 1to(x) = 1 . x

In this case,

P(X(t) = y) = � 1to(x)PXy(t) . x

The transition function Pxy(t) cannot be used directly to obtain such pro babilities as

unless the jump process satisfies the Markov property, which states that for o < s 1 < · · · < Sn < S < t and Xl ' . . . , xn, X, Y E f/,

By a Markov pure jump process we mean a pure jump process that sati sfies the Markov property. It can be shovvn, although not at th(! level of this book, that a pure jump process is NIarkovian if and only if all non-absorbing states x are such that

P x{'r: 1 > t + S i T 1 > s) = P x( T 1 > t),

i .t:� . , such that

s, t > 0,

(5) 1 - F x{t + s) = 1 _ F (t)

1 - Fx(s) x , s, t > O.

N ow a distribution function Fx satisfies (5) if and only if it is an exponential distribution function (see Chapter 5 of Introduction to Probability Theory). Wre conclude that a pure jump process is Markovian if and only if Fx is an exponential distribution for all non-absorbing states x.

3. 1. Construction of jump processes 87

Let X(t), 0 < t < 00 , be a Markov pure jump process. If x is a nonabsorbing state, then Fx has an exponential d(!nsity Ix. Let qx denote the parameter of this density. Then qx = l /Ex(r:l) > 0 and { -q t

fit) = 6�e " ,

Observe that

t ;> 0, t < o.

P xC 1: 1 > t) = foo qxe -q"s ds = e-q"t,

If .x is an absorbing state, we set qx = O.

t > o.

It follows from the �farkov property that for 0 < t 1 < · · . < tn and Xl ' . . . , Xn in !/,

(6) P(X(t1) = Xl ' . . . , X(tn) = xn)

= P(X(t1) = Xl)PXtx2(t2 - t1) · · · PXn- tXn(tn - tn - I)·

In particular, for s > 0 and t > 0

Since

Pxy(t + s) = L Px(X(t) = z, X(t + s) = y), z

we conclude that

(7) Pxy(t + s) = L Pxz(t)Pzy(s), S � 0 and t > o. z

Equation (7) is known as the Chapman-Kolmo,gorov equation. 'The transition function P Xy(t) satisfies the integral equation

(8) Px,lt) = (jXye-q"t + ft qxe- q"S ( L QxzPzy(t - S)) ds, Jo z if:.x t > 0,

which we will now verify. If x is an absorbing state, (8) reduces to the obvious fact that

t > o.

Suppose x is not an absorbing state. Then for a process starting at x,

the� event {-r:l < t , X(T1) = z and X(t) = y} occurs if and only if th(� first jump occurs at some time s < t and takes the process to z, and the process goc�s from z to y in the r(�maining t - s units of time. Thus

88 Mrarkov Pure Jump Proc�esses

so

jp x( 7: 1 < t and X (t) = y) = � P x( 7: 1 < t, X ( 7: 1) = z and X (t) == y) z ;#x

Also

Consequently,

P xy(t) = Px(X(t) == y)

= Px(7:1 > t and X(t) = y) + Px(7:1 < t and X(t) = y)

= (jXye-qxt + f' qxe- qxs ( � QxzPZy(t - ' S)) ds, Jo z ;#x

as claimed. Replacing s by t - s in the integral in (8), we can rewrite (8) as

(9) t > O.

It follows from (9) that PXy(t ) is continuous in t for t > O. Therefore th�e integrand in (9) is a. continuous function, so we can differentiate the right side. We obtain

(10) P�y(t) = - qxP Xy(t) + qx � Qxz]>Zy(t), t > O. z ;#x

In particular,

P�y(O) := - qxP Xy(O) + qx I: QxzP Zy(O)

Set

( 1 1 )

Then

(12) qxy =

It follows from (1 2) that

(13)

z ;#x

x, Y E= f/.

y == x, Y 9t: x.

3.2. Birth and death processes 89

The quantities qxy, x E !;'fJ and Y E Y, are called the infinitesimal parameters of the process. These parameters determine qx and Qxy, and thus by our construction determine a unique Markov pure jump process. We can rewrite ( 10) in terms of the infinitesimal parameters as

(14) t > o. z

This equation is known as the backward equation. If Y is finite, we can differentiate the Chapman-Kolmogorov equation

with respect to s, obtaining

(15) P�y(t + s) == � P xz(t)P�y(s), s > 0 and t > o. z

In particular,

P�y(t) = � Pxz(t)P�y(O), t > 0, z

or equivalently,

(16) t > o. z

Formula (1 6) is known as the forward equation. It can be shown that (I S) and ( 16) hold even if fI' is infinite, but the proofs are not easy and 'will be o:mitted.

In Section 3 .2 we will describe some examples in which the backward or forward equation can be used to find explicit formulas for Pxy(t) .

3.2. Birth and death processes

Let Y = {O, 1 , . . . , d} or Y = {O, 1 , 2, . . . } . By a birth and death process on f/ we mean a Markov pure jump process on f/ having infinitesirnal parameters qxy suc�h that

qxy = 0, I y - xl > 1 .

Thus a birth and death process starting at x can in one jump go only to the states x - l or x + 1 .

The parameters Ax = qx,x+ 1 , X E f/, and /lx = qx,x- 1 ' X E Y, are: called r(�spectively the birth rates and death rates of the process. The parameters qx and QXY of the process can be expressed simply in terms of the birth and death rates. By (1 3)

so that

(1 7) and

90 M'arkov Pure Jump Pro(:esses

Thus x is an absorbing state if and only if A;t = /lx = O. If x is a. nonabsorbing state, then by (1 2)

/lx y =: x - 1 , Ax + /lx

,

( 18) Q .- Ax xy '-y = x + 1 ,

Ax + /lx ,

0, else,where.

A birth and death proce:ss is called a pure birth process if /lx = 0, X E f/, and a pure death process if Ax = 0, X E f/. A pure birth process can move only to the right, and a pure death process can move only to the left.

IGiven nonnegative numbers Ax, x E f/, and /lx, x E f/, it is natural to ask whether there is a birth and death process corresponding to these parameters. Of course, /lo = ° is a necessary requirement, as is Ad = 0 if f/ is finite. The only additional problem is that explosions must be ruled out if f/ is infinite. It is not difficult to derive a necessary and sujfficient condition for the process to be non-explosive. A simple sufficient condition for the process to be non-explosiv1e is that for some positive numbers A and B

x �� O.

This condition holds in all the examples we will consider. In finding the birth and death rates of specific processes, we will use

SOlne standard properties of independent exponentially distributed random variables. Let � 1 " ' . ' �n be independent random variables having exponential distributions with respective parameters Cil , . . . , Cin• Then min (� l ' . . . ' �n) has an exponential distribution with paralneter Cit + · · · + Cin and

(19) k = 1 , . . . , n .

Moreover, with probability one, the random variables � l ' • • • , �n take on n distinct values.

�ro verify these results we observe first that

P( min (� l ' . · ,. , � n) > t) = P( � 1 > t, . . . , �n > t)

= P( � 1 > t) · · · P( �n > t)

and hence that min (� 1 ' . . . , �n) has the indicat��d exponential distribution.

3. 2. Birtb and death processes

Set

Then 11k has an exponential distribution with parameter

and �k and 11k are independent. Thus

P('k = min (' 1 ' . . . , �n)) = P(�k < t1k)

= 50"') (1'') rJ.ke - «kXPke- PkY dY) dx

(Xk --- - -----(Xk + Pk (X l + . . · + (Xn

9 1

In order to show that the random variables � 1 , . . . , � n take on n distinct values with probability one, it is enough to show that P( �i i= �j) = 1 for i -:/= j. But since � i and c.:j have a joint densitY .f, it follows that

f [ f(x, y) dx dy = 0, J {(x,y) : x =y} as desired.

Exam ple 1 . Branch i ng process. Consider a collection of particles which act independently in giving rise to succeeding generations of particles. Suppose that each particle, from the time it appears, waits a random length of time having an exponential distribution with paralmeter q and then splits into two identical particljes with probability �p and disappears with probability 1 - p. Let X(t), 0 < t < 00 , denote the number of particles present at time t. This branching process is a birth and death process. Find the birth and death rates.

IConsider a branching process starting out with x particles. Let

, 1 " . . . , 'x be the times until these particles split apart or disappear. Then � l ' . . . , �x each has an �exponential distribution with parameter q" and hence T 1 = min (� 1 ' . . . ' �x) has an exponential distribution with parameter qx = xq. Whichever particle acts first has probability p of splitting into two particles and probability 1 -- p of disappearing. Thus for x > 1

Qx,x + 1 == P and Qx,x- l - 1 - p.

92 M'arkov Pure Jump Processes

State 0 is an absorbing state. Since Ax = qx�Qx,x+ 1 and Ilx = qxQ,x,x- 1 , w�� conclude that

and Ilx = xq(1 -- p), x > 0.

In the preceding exam.ple we did not actually prove that the process is a birth and death process'l i .e . , that it "starts from scratch" after making a junlp. This intuitively reasonable property basically depends on thle fact that an exponentially distributed random variable , satisfies the forrnula

P(, > t + s I , > s) = P(, > t), S, t > 0,

but a rigorous proof is complicated. By (1 7) and the definition of Ax and Ilx' the backward and forward

equations for a birth and death process can be written respectively as

(20) P�y(t) = IlxPX- l ,y(t ) - (Ax + Ilx)PXy(t) + AxPx+ 1 ,y(t), t > 0,

and

(2 1 ) P�y(t) = Ay- 1Px,y- l (t ) - (Ay + lly)P.Xy(t ) + lly+ 1Px,y+ l (t ), t > o.

In (21 ) we set A- 1 = 0, and if [/ = {O, . " . , d} for d < 00, w'e set

Ild+ l = o. 'We will solve the backward and forward equations for a birth and death

process in some special (�ases. To do so we will use the result that if

(22) f'(t) = -af(t) + get), t > 0,

(23) f(t) = 1(0)e- at + I� e- a(t - 5)g(s) ds, t > o.

The proof of this standard result is very easy. We multiply (22) through by erx t and rewrite the resulting equation as

d -- (erxfj(t)) = erxtg(t) . dt

Integrating from 0 to t vve find that

ea'f(t) - 1(0) = I� (/'5g(S) ds,

and hence that (23) holds.

3.:! .1 . Two-state b i r-th and death proc.�ss. Consider a birth and death process having state space [/ = {O, I } , and suppose that 0 and 1 are both non-absorbing states. Since 110 = A 1 = 0, the procc�ss is

3. 2. Birt" and death processes 93

d��termined by the parameters Ao and /11 . For simplicity in notation we set A = Ao and /1 = /11 " We can interpret such a process by thinking of state 1 as the system (e.g. , telephone or machine) operating and state 0 as the system being idle. We suppose that starting from an idle state the system remains idle for a random length of time which is exponentially distributed with param��ter A, and that starting in an operating state the system continues operating for a random length of time which is exponentially distributed with parameter /1.

We will find the transition function of the process by solving the backward equation. It is left as an exercise for the reader to obtain the: same results by solving the forward equation.

Setting y = 0 in (20)" we see that

(24) t > 0,

and

(25) t > o.

Subtracting the second lequation from the first,

!{ (Poo(t) - Pl O(t» = - (). + fl)(Poo(t) - PlO(t» . dt

Applying (23),

(26) Poo(t) - P10(t) = (Poo(O) - 1)1 0(0))e- (). + p)t

H'ere we have used th�� formulas P 00(0) = 1 and P 1 0(0) = O. It now follows from (24) that

Thus

P010(t) = - A(PoO(t) - jPI0(t))

Poo(t) = Poo(O) + f� Poo(s) ds

= 1 - f� Ae- (H /l)s ds

= 1 - A ( 1 - e- ().+ p)�,

A + /1

or equivalently,

(27) P ( ) /1 A - (). + p)t . 00 t = + e , J� + /1 A + /1 '

t � o .

94 M'arko v Pure Jump Pro(�esses

Now, by (26), Pl O(t) = Poo(t) - e- (). + p)t, and therefore

(28) t > O.

By setting y = 1 in the backward equation, or by subtracting Poo(t) and Pl O(t) from one, we conclude that

(29)

and

(30)

A PO l (t) = A + /1

A __ e- (). + p)t ,.. , A + /1

P 1 1 ( t) = A

+ /1 e - (). + p)t , )., + /1 A + /1

t > 0,

t > O.

.From (27)-(30) we see� that

(3 1 )

where

(32)

lim P Xy(t) = n( y), t-+ + 00

nCO) = __

/1_

A + /1 and

A n(l) = - -- - . A + /1

If �n;o is the initial distribution of the process, then by (27) and (28)

P(X(t) = 0) = no(O)P oo(t) + (1 - no(O))P 1 0(t)

= ��- + (no(O) -/1 ) e- (). + p)t,

A +- /1 A + JU t > 0 ..

Sirnilarly,

P(X(t) = 1 ) = -�-. + (no(l) - , A ) e- (). + p)t,

A +- /1 A + JU t > 0 ..

Thus P(X(t ) = 0) and J·(X(t) = 1 ) are indep(�ndent of t if and only if no is the distribution n given by (32).

3.�! .2 . Poisson proct�ss. Consider a pure birth process .X(t), o ::; t < 00 , on the nonnegative integers such that

x > 0.

Since a pure birth process can nlove only to the right,

(33) y < x and t > o.

(34) t > o.

3.2. Birth and death processes

The forward equation for y =1= 0 is

P�y(t) =: APx,y - l (t) - APxy(t),

From (23) we see that

t > o.

Px,(t) = e- ltpx'(O) + A. f� e- l(t - s)PX,Y_ l(S) ds,

Since PXY(O) = �XY' we conclude that for y > x

(35) _ " - A(t - S) ft PXy(t) - A. 0

e Px,y- t (s) ds,

It follows from (34) and (35) that

and hence by using (35) once more that

t > o.

t > o.

P (t) = A ft e- A(t - s)Ase- AS ds = A2e- At ft s ds =

(At)2 e- At . x,x+ 2 2 o 0

By induction

(36) (At)y- xe - At P (t) - '--xy -

( y _ x) ! '

Formulas (33) and (36) imply that

o < x �;: y and t > o.

(37) P:x.;y(t) = PO,y-x(t), t > 0,

95

and that if X(O) = x, then X(t) - x has a Poisson distribution with parameter At.

In general, for 0 < S < 1, X(t) - Xes) has a Poisson distribution with parameter A(t - s). For if 0 < s < 1 and y is a nonnegative integer, then

P(X(t) - Xes) = y) = � P(X(s) = x and X(t) = x + y) x

= � P(X(s) = x)Px, x + y(t - s) x

= � P(X(s) = x)POy(t - s) x

= POy(t - s)

= (A(t - s))Ye- A(t - S)

y !

If 0 < 11 < . . . < 1m the random variables

96 Mrarkov Pure Jump Proc�esses

arc� independent . For wt:� observe that if Z l ' . . . , Zn- 1 are arbitrary integers, thlen by (6) and (37)

P(.X(t2) - X(t1) = Z l ' " . . , X(tn) - X(tn - 1) = Zn- 1)

= � P(){(t 1) = X)POZt(t2 - t1. ) · · · POZra - t(tn - tn- 1) x

By a Poisson process lwith parameter A on 0 < t < 00, we mean a pure birth process X(t), 0 < t < 00, having state space {O, 1 , 2, . . . } , constant

birth rate Ax = A > 0, and initial value X(O) = O. According to the above discussion the Poisson process satisfies the following three properties :

(i) X(O) = o. (ii) X(t) - X(s) has a Poisson distribution with parameter A(t - s)

for 0 < S < t. (iii) X(t2) - X(t1), X�(t3) - X(t2), • • • , X(tn) - X(tn - 1) are indepen

dent for 0 < t 1 :� t 2 < . · · < tn.

The Poisson process can be used to model events occurring in time, such as calls coming into a te�lephone exchange, customers arriving at a queue, and radioactive disintegrations. Let X(t), 0 <' t < 00 , denote the number of events occurring in the time interval (0, tJ . For 0 < S < t the random va.riable X(t) - Xes) d1enotes the number of events in the time interval

(s" t J . If the waiting tinles between successive: events are independent and exponentially distribute:d with common parameter A, then X(t) ,� 0 < t .< 00 , is a Poisson process and properties (i)-(iii) hold. Property (ii) states that the number of events in any interval has a Poisson distribution. Property (iii) states that the numbers of events in disjoint time intervals are independent. Conversely, if X(t), 0 < t < 00 , satisfies properties (i)-(iii), then the waiting times between successive events are indepe:ndent and exponentially distributed with common parameter A, and henc(� X(t) is a pure birth process with constant birth rate A. This result was proved in Chapter 9 of Volume I, but will not be needed.

Since the Poisson process is a pure birth process starting in statle 0, it follows that for n > 1 the time Tn of the nth jump equals the time Tn when the process hits state n. When the Poisson process is used to model levents oc�curring in time as d(�scribed above, the common time Tn = Tn is the tilne when the nth event occurs.

The Poisson process can be used to construct a variety of other processes.

IExample 2 . B ranching process with irnmigratio n . Consider the branching process introduced in Example 1 . Suppose that new particles immigrate into the system at random times that form a Poisson process with parameter A and th(�n give rise to succeedilng generations as described in Example 1 . Find th�� birth and death rates of this birth and death process.

Suppose there are initially x particles pres1ent. Let C; l' . . . , C;x be the tinles at which these particles split apart or disappear, and let " be the first tinle a new particle enters the system. We inte�rpret the description of the system as implying that " is independent of C;1 ' • . . , C;x . Then ' 1 ' . . . , 'x, " are independent exponentially distributed random variables having res.pective parameters q, . . . , q, A. Thus

'r 1 = min (, l ' . . . , 'x, ,,)

is �exponential1y distributed with parameter q},� = xq + A, and by ( 19)

A P( 'l' 1 = ,,) = , · xq + it

The event {X('l'1) = x -t- I } occurs if either ! 1 = " or

'l'1 = min (c; l ' • • • , C;;�) and a particle splits into two new particles at time 'l'1 . Thus

Also,

WC� conclude that

and

A xq Qx,x' + 1 - + -- p. xq + A xq ·+ A

xq Q�x,X - 1 - (1 _

. p). xq + A

Ilx = qxQx,X- 1 = xq(1 -- p).

It is also possible to c:onstruct a Poisson process with parameter A on - (XJ < t < 00 . We first construct two independent Poisson pro(;esses X1 (t), 0 < t < 00, and X2(t) , 0 < t < 00, both having parameter A. Wc� then define X(t), - CX) < t < 00, by

X(t) = { - X1( - t), X 2(t),

t < 0, t > o.

98 "�arkov Pure Jump Projcesses

It is easy to show that the process X{t), - rJ.) < 1 < 00 , so constructed, satisfies the following three properties :

(i) X(O) = O. (ii) X(t) - X{s) has a Poisson distribution with parameter A{t - s) for

s < t.

(iii) X(t2) - X(t 1), • • • , X(tn) - X(tn - 1) are independent for 11 <

t2 < · · · < tn ·

3.,2 .3. Pure birth pr"ocess. Consider a pure birth process X(t), o < t < 00, on {O, 1 , 2, . . . }. The forward e�quation (21) reduces to

(38) t > O.

Since the process moves only to the right,

(39) y < x and t > o.

It follows from (38) and (39) that

Since Pxx(O) = 1 and PIXY(O) = 0 for y > x, 'we conclude from (23) that

(410)

and

(41)

P (t) = e- lxt xx , t ;> 0,

y > x and t > O.

Wre can use (40) and (4 1 ) to find Pxy(t ) recursively for y > x. In particular,

and hence for t > 0

(42)

Example 3. Li near birth process. Consider a pure birth procc�ss on {O, 1 , 2, . . . } having birth rates

3. 2. Birtb and death processes 99

for some positive constant A (the branching process with p = 1 is iQf this form). Find PXy(t) .

As noted above, PXy(t) = 0 for y < x and

P xx(t) = e- Axt = e- ·XAt.

W'e see from (42) that

To compute Px,x+ 2(t ) �re set y = x + 2 in (41 ) and obtain

Px,x+ 2(t) = (x + 1)XA E e- (x+ 2)l(t - s)e-xls(1 - e- l, ds

= (x + 1)xAe- (x+ 2).I.t f: e2.l.s(1 - e-� ds

= (x + 1)xAe- (x+ 2)lt f: els(els - 1) ds

= (x --I- 1)xAe- <x+ 2)At (eAt _. 1 )2

2;t

= (X � 1) e-X.l.t(1 _ e- l')2 ,

It is left as an exercise for the reader to show by induction that

(43) y > x and t > O.

3.:2 .4. Infi nite servelr q ueue. Supposc� that customers arrive for service according to a I)oisson process with parameter A and that each customer starts being served immediately upon his arrival (Le. , that there an� an infinite number of servers) . Suppose that the service timc�s are independent and expon(�ntially distributed with parameter 11. Let X(t), o :< t < 00, denote the: number of customers in the process of being served at time t. This birth and death proct:�ss, caned an infinite server queue, is a special case of the branching procc�ss with immigration �corresponding to q = 11 and p = O. We conclude that Ax = A and Ilx == XIl,

x > O. The transition function PXy(t) will now be obtained by a probabilistic argument.

Let Y(t) denote the number of customers who arrive in the time interval (0, t] . An inter(�sting and useful result about the Poisson process is that conditioned on Y(t) = k, the times 'when the first k customers

700 Afarko v Pure Jump PrOC':8sses

arrive are distributed as k independent random variables each uni�ormly distributed on (0, t] . In order to see intuitivlely why this should bt:� true, consider an arbitrary pa.rtition ° = to < t 1 <: . . . < tm = t of [0, t] and let Xi denote the number of customers arriving between time ti - 1 and tirne tie Then Xl' . . . �' Xm are independent random variables having Poisson distributions with respective parameters

A(t 1 - to), · · · , A(tm - tm- 1),

and Xl + . . . + Xm = yet) has a Poisson distribution with para.meter At.. Thus for Xl ' • . • , Xm nonnegative integers adding up to k,

P(XI = Xl ' · • • ' Xm = Xm I yet) = k)

= .F'(X 1 = Xl ' · • • , Xm =: Xm I Xl + · · · + Xm = k)

P'(X 1 = Xl ' · • • , Xm =: Xm, Xl + · · · + Xm = k) -

P(X 1 + · · · + Xm = k)

= P-(X1 = Xl ' · · · ' Xm =: Xm) P(X 1 + · · · + Xm = k)

m [A(t - t. )]X!e - let! - t! - 1 ) 11 i ,- 1 i == 1 Xi !

= · n i - i- 1. • k ' m (t t ) X!

[[7'= 1 Xi ! i = 1 t

But these multinomial probabilities are just those that would be obtained by choosing the k arrival times independently and uniformly distributed over (0, t] .

If a customer arrives at time S E (0, t] , the probability that he is still in thle process of being served at time t is e

- p(t- s) . Thus if a customer a.rrives at a time chosen unifor:mly from (0, t] , the probability that he is still in th�e process of being served at time t is

1 it 1 - pt _ - p(t - s) d _

- e Pt - - e s - ---- .

t 0 J.J�t

Let X1(t) denote the number of customers arriving in (0, t] that are still in the process of b(�ing served at time t. It follows from the results of the previous two paragraphs that the conditional distribution of X1(t) given that yet) = k is a binomial distribution with parameters k and Pt' i .e . , that

P(X 1(t) = n I Y(t) = k) = (:) p�(l - pt)k-n .


Since Y(t) has a Poisson distribution with parameter At, we conclude that

00 P(Xt(t) = n) = � P(Y(t) = k, Xt(t) = n) k= n

00 = � P(Y(t) = k)P(Xl(t) = n I yet) = k) k = n

00 ( 'I t)k - It k , _ � A e . n(1 _ )k- n - � Pt Pt k = n k ! n ! (k - n) !

= (Atpt)"e- lt f (At(1 - Pt)l-"

n ! k = n (k ·- n) !

= (Atpt)"e- lt f (At(l - pt))

m

n ! m= O �n ! ( 'l )n - It

= AtPt e elt( l - Pt) n !

= (Atpt)

ne- ltpt n !

Thus Xt(t) has a Poisson distribution with parameter

AtPt = � (1 - e- Il�� . Jl

'Let x denote the nunlber of customers pr��sent initially and let X2(t) denote the number of these customers still in the process of being served at time t. Then X2(t) is ilndependent of X1 (t) and has a binomial distribution with parameters x and e-Il t. Since .X(t) = X1(t) + X2(t), we conclude that

min(x,y)

P Xy(t) = P x(X(t) = y) = � P x(X 2(t) = k)P(X 1 (t) = Y - k). k = O Therefore

(44)

As t � 00, e-Il t � 0, and hence the terms in the sum in (44) all approach o ��xcept the term corresponding to k = O. Consequently

(45) ( 'l / )y - .�/ II

lim P Xy(t) = I\. Jl e . t-+ oo y !

1 02 Afarkov Pure Jump Pro4';esses

3.3. Properties of cl M arkov pu re jump process

In this section we will discuss the notions of recurrence, trans.ience, irreducibility, stationary distributions, and positive recurrence of Markov pure jump processes. lrhe results will be de:scribed briefly and without proofs, as they are very similar to those for the: Markov chains discussed in Chapters 1 and 2. In Section 3 .3 . 1 we apply these results to birth and de:ath processes.

Let X(t), 0 < t < 0.) , be a Markov pure jump process having state space f/. For y E f/ and X(O) #= y, the first visit to y takes place at time

Ty = min (t > 0 : XCI) = y).

If X(O) = y, then min (t > 0 : X(t) = y) =: O. A more useful random variable in this case is the time Ty of the first return to y after the process leaves y. Both cases are� covered by setting

Ty = min (t > Tl : X(t) = y).

H�ere 't 1 is the time of thc:� first jump. If T 1 = ($.) or X(t) #= y for all t > T l ' w(� set Ty = 00 .

If x is an absorbing state, set Pxy = �xy ; and if x is a non-absorbing state, set

Pxy = Px(Ty < 00).

A state y E f/ is called recurrent if Pyy = 1 and transient if Pyy < 1 . The process is said to be a recurrent process if all of its states are recurrent and a transient process if all of its states are transient. The process is called irreducible if Pxy > 0 for all choices of x E f/ and y E f/.

The function P(x, y) := QXY' x E f/ and y E f/, is the transition function of a Markov chain called the embedded chain.. The quantities Pxy for this Markov chain are equal to the corresponding quantities for the Markov pure jump process. This relationship shows that results of Chapter 1 involving only the numbers Pxy are also valid in the present context. In particular, an irreduciblt� .process is either a recurrent process or a transient process. It is recurrent if and only if the embledded chain is recurrent.

If n(x), x E f/, are nonnegative numbers su]mming to one and if

(46) � n(x)Px:�( t) = n( y), Y E 9() and t > 0, x

th�en n is called a stationary distribution. If X (0) has a stationary distribution n for its initial distribution, then

P(X(t) = y) = � n(x)PXy(t) = n(y), x

so that X(t) has distriblition n for all t > o.

3.3. Propt3rties of a Markov pure jump process 1 03

If (46) holds and f/ is finite, we can differentiate this equation and obtain

(47) � n(x)PI�y(t) = 0, y E f/ and t > O. x

In particular, by setting t = 0 in (47), we conclude from (1 1 ) that

(48) �� n(x)qXY = 0, Y E; f/. oX

It can be shown that (47) and (48) are valid c�ven if f/ is an infinite set. Suppose, conversely, that (48) holds. If f/ is Jfinite we conclude frolm the backward equation ( 14) that

Thus

� � n(x)PXy(t) = � n(x)P�y(t) dt x x

= � n(x) (� qxzPzit») = � (� n(x)qxz) PZy(t)

= O.

� n(x)PXy(t) x

is a constant in t and thc� constant value is given by

� n(x)PXY(O) = � n(x)<5xY' = n( y) . x x

Consequently (46) holds.. This conclusion is also valid if f/ is infinit�e , but th�e proof is much more complicated. In sumrnary, (46) holds if and only if (48) holds.

A non-absorbing recurrent state x is called positive recurrent or null recurrent according as th�e mean return time mx = Ex(Tx) is finite or in:finite. An absorbing state is considered to be positive rlecurrent. The process is said to be a positive recurrent process if all its states are positive recurrent and a null recurrent process if all its states are null recurrent. An irreducible recurrent process must be either a null recurrent process or a positive recurrent process. It can be shown that a stationary distribution is concentratled on thc� positive recurrent states, and hence a process that is transient or null recurrent has no stationary distribution. An irreducible positive recurrent process has a unique stationary distribution n, which, unless f/ consists of a single necessarily absorbing state, is given by

(49) X E Y.

1 04 Marko v Pure Jump Processes

Formula (49) is intuitively reasonable. For in a large time interval

[0, .1] , the process makes about t/mx visits to x and the average time in x per visit is I /qx . Thus the total time spent in state x during the time

interval [0, t] should be about t/(qxmx) and the proportion of time spent

in state x should be about I /(qxmx). This argument can be made rigorous

by using the strong law of large numbers as was done in Section 2.3 .

Markov pure jump processes do not have any periodicities, and, in

particular, for an irreducible positive recurrent process having stationary

distribution n,

(50) lim PXy(t) = n( y), x, Y E f/. t -+ oo

If X(O) has the initial distribution no(x), x E f/, then

P(X(t) = y) = � no(x)PXy(t), x

which, by (50) and the bounded convergence theorem, converges to

� no(x)n( y) = n(y) x

as t � 00 . In other words

lim P(X(t) = y) = n( y), t-+ oo

and hence the distribution of X(t) converges to the stationary distribution

n regardless of the initial distribution of the process.

3.3.1 . Appl ications to b i rth and death processes. Let X(t), ° < t < 00 , be an irreducible birth and death process on {O, I , 2, . . . } .

The process i s transient if and on]y i f the embedded birth and death chain

having transition function P(x, y) = Qxy, x > ° and y > 0, is transient.

From ( 1 8) in this chapter and the results in Section 1 . 7, we conclude that

the birth and death process is transient if and only if

(51) f fll ' " flx <

00 .

X- 1 A. o 0 0 A. - 1 x

Equation (48) for a stationary distribution n becomes

n(1 )/11 - n(O)Ao = 0,

(52)

n(y + 1)/1y+ 1 - n(Y)Ay = n(Y)/1y - n(y - I)AY- 1 '

It follows easily from (52) and induction that

n(y + 1)/1y+ 1 - n(Y)Ay = 0, y � 0,

y � I .

3. 3. Propf.,rties of B Markov pur., jump process

and hence that

Consequently,

(53)

Set

(54)

A 1t( y -I- 1) = -

y 1t( Y), /1y+ 1

n(x) = Ao ' · · Ax- 1 n(O), /1 1 · · · /1x

{I , 1tx = '1 • • • '1 AO Ax- 1

/11 · · · /1x ,

Then (53) can be written as

y > o.

x > 1 .

x = 0,

x > 1 .

(55) x �� o.

Conversely, (52) follows from (54) and (55). Suppose now that Lx 1tx < 00, i .e . , that

(56) � A.o • • • A.x- 1 < � <X).

x= 1 /11 · · · /1x

1 05

WC� conclude from (55) that the birth and d1eath process has a unique stationary distribution 1t!, given by

(57) x > 0.

If (56) fails to hold, the birth and death process has no stationary dis.tribution.

In summary, an irreducible birth and death process on {O, 1 , 2, . .. . } is transient if and only if (5 1 ) holds, positive re�current if and only if (56) holds, and null recurrent if and only if (5 1 ) and (56) each fail to hold, i .e . , if and only if

(58) and � ilo • • • Ax- 1 � ,---- = 00 .

x= 1 /11 · · · /1x

��n irreducible birth and death process having finite state space {O, 1 , . . . , d} is necessarHy positive recurrent. It has a unique stationary distribution given by

1t 1t(x) = d x ,

Ly= o 1ty (59) o < x < d,

where 1tx, ° < X < d, is given by (54).

1 06 M'arkov Pure Jump Processes

IExample 4. Show that the infinite server queue is positive recurrent and find its stationary distribution.

'The infinite server queue has state space {O, 1 , 2, . . . } and birth and death rates

A = A x and /1x = X/1, x > 0.

This process is clearly irreducible. It follows from (54) that

x > o.

Since

00 {A/II)X � r = e)./Il

x= o x !

is finite, we conclude that the process is positive recurrent and has the unique stationary distribution n given by

(60) x > 0,

which we note is a Poisson distribution with parameter A//1. We alsOl note that (50) holds for this process, a direct consequence of (45) and (60).

iExample 5. N serVE�r q ueue. Suppose customers arrive according to a Poisson process with parameter A > O. They are served by N servers, where N is a finite positive number. Suppose the service times are exponentially distributed writh parameter /1 and that whenever there are more than N customers waiting for service the exce�ss customers form a queue and wait until their turn at one of the N serv,ers. This process is a birth and death process on {O, 1 , 2, . . . } with birth rates Ax = A, x > 0, and death rates {X/1' /1Je =

N /1, o < x .. ::::: N, x > N.

De�termine when this process is transient, null recurrent, and positive recurrent ; and find the stationary distribution in the positive recurrent case.

ICondition (5 1 ) for transience reduces to

� - < 00 . 00 (N /1) x

x= o A

Exercises 7 07

Thus the N server queue is transient if and only if N Jl < A. Condition (56) for positive recurrence reduces to

� - < 00 . 00 ( A )X

x= o NJl

The N server queue is therefore positive recurrent if and only if A .< NJl. Consequently the N server queue is null recurrent if and only if A == NJl. These results naturally are similar to those for the 1 server queue discussed in Chapters 1 and 2.

In the positive recurrent case,

Set

n = x

o :5; x < N,

x > N.

00 N - 1 (l I Jl )X (A I Jl)N ( l ) - 1 K = � nx = � + -- 1 - - .

x= o x = o x ! N ! NJl

We conclude that if A < NJl, the stationary distribution is given by

n(x) = 1 (AI Jl)X

K " x .

1 (AI Jl)X K N ! N x-N '

Exercises

o < x < N,

x > N.

1 Find the transition function of the two-state birth and death process by solving the forward equation.

2 Consider a birth and death process having three states 0, 1 , a.nd 2, and birth and death rates such that Ao = Jl2 . Use the forward equation to find P Oy(t), Y = 0, 1 , 2.

Exercises 3-8 all refer to events occurring in time according to a Poisson process with parameter A on 0 < t < 00 . Here X(t) denotes the number of events that o�ccur in the time interval (0, tJ .

3 Find the conditional probability that there are m events in the :first s units of time, given that there are n events in the first t units of time, where 0 < m < n and 0 < s < t.

4 Let Tm denote the tirne to the mth event. Find the distribution function of Tm. Hint : { Tm < t} = {X(t) > m} .

5 Find the den.sity of the random variable Tm in Exercise 4. Hint : First consider some specific cases, say, m = 1 , 2, 3.

1 08 M'arkov Pure Jump Processes

6 Find P(T1 < s I X(t) = n) for ° < S < t and n a positive integer.

7 Let T be a random variable that is independent of the times when events occur. Suppose that T has an exponential density with parameter v :

fT(t) = { ve- vt, 0,

t > 0, t < 0.

Find the distribution of X(T), which is the number of events occurring by time T. Hint : Use the formulas

P(X(T) = n) = EX) f-r<.t)P(X(T) = n i T = t) dt

and P(X( :T) = n i T = t ) = P'(X(t) = n).

8 Solve the previous eXlercise if T has the gamma density with paranleters a and v :

t > 0, t < 0.

9 Verify Equation (43).. 1 0 Consider a pure death process on {O, 1 , 2, . . . } .

(a) Write the forward equation. (b) Find Pxx(t ). (c) Solve for PXy(t) in terms of Px,y + l (t). (d) Find Px,x- l (t). (e) Show that if J.,lx == xJ.,l, x > 0, for some: constant J.,l, then

Pxy(t) = G) (e- "'Y(l - e- ",)x- y, 0 < y < x.

1 1 Let X(t), t > 0, be the infinite server queue and suppose that initially there are x customers present. Compute the mean and variance of X(t).

1 2 Consider a birth and death process X(t), t > 0, such as the branching process, that has state space {O, 1 , 2, . . . } and birth and death rates of the form

A = XA x and x > 0, where A and J.,l are nonnegative constants. Set

00 m;,,(t) = Ex(X(t)) = � yP xy(t) . y= o

(a) Write the forward equation for the process. (b) Use the forward c�quation to show that m�(t) = (A - J.,l)mx(t). (c) Conclude that

mx(t) = xe(). - Il)t. 1 3 Let X(t) , t > 0, be as in Exercise 1 2. Set

00 sx(t) = Ex(X 2(t)) = � y2pXy(t). y= o

Exercises 709

(a) Use the forward equation to show that

(b) Find sx(t) . (c) Find Var X(t) under the condition that X(O) = x.

1 4· Suppose d particles are distributed into two boxes . A particle in box 0 remains in that box for a random length of time that is exponentially distributed with parameter A before going to box 1 . A particle in box 1 remains there for an. amount of time that is exponentially distributed with parameter fl be�ore going to box o. Th.e particles act independently of each other. Let .X(t) denote the number of particles in box 1 at time t > O. Then X(t), t > 0, is a birth and death proce:ss on {O, . . . , d}. (a) Find the birth and death rates. (b) Find PXd(t) . Hint : Let Xlt), i = 0 or 1 , denote the number of

particles in box 1 at time t > 0 that started in box i at time 0, so that X(t) = Xo(t) + X1(t). If �r(O) = x, then Xo(t) and X1(t) are independent and binomially distributed with para1neters defined in terms of x and the transition function of the two-state birth and death process.

(c) Find Ex(X(t» . 1 51 Consider the infinitle server queue discussed in Section 3 .2.4. Let

X1(t) and X2(t) be as defined there. Suppose that the initial distribution 1to is a Poisson distribution with parameter v. (a) Use the formula

00

P(X'2(t) = k) = � 1to(x)Px(X2(t) = k) x=k

to show that X'2(t) has a Poisson distribution with parameter ve - pt

.

(b) Use the result of (a) to show that X'(t) = X1(t) + X2(t) has a Poisson distribution with parameter

A ( A) - pt � + v - � e .

(c) Conclude that X(t) has the same distribution as X(O) if and only if v = A/fl.

1 6; Consider a birth and. death process on the nonnegative integers ,whose death rates are giv��n by flx = x, x > 0. Determine wheth(�r the process is transient, null recurrent, or positive recurrent if the birth rates are

(a) Ax = x + 1 , x :� 0 ; (b) Ax = x + 2, x :� O.

1 1 0 Mrarkov Pure Jump Proc�esses

1 7 Let X(t), t > 0, be a birth and death process on the nonnegative integers such that Ax > 0 and flx > 0 for x > 1 . Set Yo = 1 and

_

fl1 · · · flx Yx -A . . . A

' 1 x

x: > 1.

(a) Show that if L�: o Yy = 00, then Pxo == 1, x > 1 . (b) Show that if L;'=: o Yy < 00, then

P _ 2:;=x Yy xO - � oo ' �y= O Yy

Hint : Use Exercise 26 of Chapter 1 .

x > 1 .

1 8 Let X(t), t > 0, be a single server queue (lv = 1 in Example 5). (a) Show that if fl > A > 0, then PxO = 1 , x > 1 . (b) Show that if fl <' A, then

PxO = (fl/A)X, x > 1 .

1 9 Consider the branching process introduced in Example 1 . Use Exercise 1 7 to show that if p < !, then PJCO = 1 for all x and that if p > t, then (1 _ p) X

PxO = p , .x > 1 .

20 Find the stationary distribution for the process in Exercise 14.

2 1 Suppose d machines are subject to failure�s and repairs. The failure times are exponentially distributed with parameter fl, and the repair times are exponentially distributed with parameter A. Let X(t) d�enote the number of machines that are in satisfactory order at time t. If there is only one repairman, then und1er appropriate reasonable assumptions, X(t), t > 0, is a birth and death process on {O, 1 , . . . , d} with birth rates Ax = A, 0 < x < d, and death rates flx == Xfl, o < x < d. Find th�e stationary distribution for this process.

22 Consider a positive recurrent irreducible birth and death process on f/ = {O, 1 , 2, . . . }, and let X(O) have the stationary distribution 'It for its initial distribution .. Then X(t) has distribution 1t for all t > O. The quantities

00 EAX(t) = � Ax'lt(X) and

x= o

00 EJux(t) = � flx 1t (x)

x= O can be interpreted, respectively, as the average birth rate and the average death rate of the process. (a) Show that the av(�rage birth rate equals the average death ratle. (b) What does (a) imply about a positive n�current N server queue?

4 Second Order

Processes

A stochastic process can be defined quite generally as any collection of random

variables X(t), t E T, defined on a common probability space, where T is a subset

of ( - 00 , (0) and is usually thought of as the time patameter set. The process is

called a continuous parameter process if T is an interval having positive length and

a discrete parameter process if T is a subset of the integers . If T = {O, 1 , 2, . . . } it is

usual to d'enote the process by X", n > O. The Markov (�hains discussed in Chapters

1 and 2 are discrete pararneter processes, while the pure jump processes discussed

in Chapter 3 are continuous parameter processes.

A stochastic process X(t), t E T, is called a second order process if EX2(t) < 00

for each t E T. Second order processes and random variables defined in terms of

them by various "linear" operations including integration and differentiation are

the subjects of this and the next two chapters. We �rill obtain formulas for the

means, variances, and covariances of such random variables.

We will consider continuous parameter processes almost exclusively in these

three chapters. Since no new techniques are needed for handling the analogous

results for discrete paranleter processes, little would be gained by treating such

processes in detail .

4.1 . Mea n a nd covaria nce functions

Let X(t), t E 1�, be a second order process.. The mean function /lx(t), t E T, of the process is defined by

/lx(t) = EX(t).

The covariance function rx(s, t), s E T and t E :T, is defined by

rx(s, t) == cov (X(s), X(t)) = EX(s)X(t) - EX(s)EX(t).

This function is also called the auto-covariance function to distinguish it

from the cross-covariance function which will be defined later. Since

1 1 1

1 12 Second Order Pro(�esses

Var X(t) = cov (X(t), .. :f(t)), the variance of X(t) can be expressed in terms of the covariance function as

( 1 ) Var X(t) = rx(t, t), t E T.

By a finite linear combination of the randolm variables X(t), t E T, we rnlean a random variabl�� of the form

n � bjX(tj), j= 1

where n is a positive int(�ger, t 1 , • . . , tn are points in T, and b 1 , . . . , bn are real constants. The cova.riance between two such finite linear combinations is given by

In particular,

(2)

m n = � � aibjrx(sb tj)'

i = 1 j= 1

It follows immediately from the definition of the covariance fUIlction that it is symmetric in s and t, i .e. , that

(3) rx(s, t ) = rx(t, s), s, t E T.

It is also nonnegative deJ7nite. That is, if n is a positive integer, t l ' · . . , tn ar�e in T, and h1 ' . . . , hn are real numbers, the:n

n n � � bibjrx{tb tj) ;;:� O. It = 1 j= 1

This is an immediate consequence of (2) . We say that X(t), - CfJ < t < 00, is a second order stationary process

if for every number 't' the second order process Y(t), - 00 < t .< 00 , defined by

Y(t) == X(t + 't'), - 00 .< t < 00, ha.s the same mean and covariance functions. as the X(t) process. It is left: as an exercise for thc� reader to show that this is the case if and only if flx(t) is independent of t and rx(s, t ) depends only on the diffe:rence between s and t.

Let X(t), - 00 < t <: 00, be a second ord��r stationary process. Then

- 00 < t < 00 ,

4. 1. Mean' and covariance functions 1 13

where /lx denotes the Icommon mean of the random variables X(t), - (XJ < t < 00 . Since rx(s, t) depends only on the difference betw'een s and t,

(4) rx(s, t ) = rx(O, t - s), - 00 < s, t < 00 . The function rx(t), - 00 < t < 00, defined by

(5) - 00 <' t < 00 , is also called the covariance function (or auto-�covariance function) of the process. We see from (4) and (5) that

rx(s, t ) =: rx(t - s), - 00 .. < s, t < 00 . It follows from (3) that 'x(t) is symmetric about the origin, i .e. , that

- oo < t < oo.

The random variables )r(t), - 00 < t < 00 , have a common variance given by

Var X(t) = rx(O) , - oo <� t < oo .

l�ecall Schwarz's inequality, which asserts that if X and Y are random variables having finite second moment, then (EXy)2 < EX2.Ey2 . Applying Schwarz's inequality to the rando:m variables X - EX' and Y ,- EY, we see that (cov (X, y))2 < Var X 'Var Y.

It follows from this last inequality that

Icov (X(O), X(t)) 1 < �Var X(O) Var X(t),

and hence that

- oo < t < oo .

If 'x(O) > 0, the correlation between Xes) and xes + t) is given independently of s by

cov (X(s), xes + t)) _ rx(t) �Var Xes) .JVar X(t)

-rx(O)

, -- 00 < s, t < 00 .

I:xample 1 . Let Zl and Z2 be independent normally distributed random variables each having mean 0 and variance (]2 . Let A be a real constant and set X(t) == Zl cos At + Z2 sin At, - 00 < t < 00 . Find the: mean and covarianc:e functions of X(t), - 00 < t < 00, and show that it is a second order stationary process.

�We observe first that

/lx(t) = EZ1 cos itt + EZ2 sin At = 0, - oo < t < oo.

1 14

Next,

rx(s, t) = cov (JC(s), X(t»

= EX(s).X(t) - EX(s)EX(t)

= EX(s).X(t)

Second Order Proc':esses

= E(Z1 (;oS AS + Z2 sin As)(Zl cos At + Z2 sin At)

= Ezi cos AS cos At + EZ� sin AS sin At

= 0'2( cos AS cos At + sin AS sin At)

= 0'2 cos A(t - s).

This shows that X(t), - 00 < t < 00, is a second order stationary process having mean zero and covariance function

'x(t) == 0'2 cos At, - oo '< t < oo .

Example 2 . Consider a two-state birth and death process as discussed in Section 3.2. 1 . It follows from that discussion that the transition probabilities of the process are givrn by

(6) P ( ) 1 P ( )

Jl A - (). + Il)t 00 t = - 0 1 t = + -- e , A + Jl A + Jl

P1 1(t) = 1 - P10(t) = A + �_ e- ().+ Il)t,

A + Jl A + Jl

t > 0,

t > 0"

where A and Jl are positive constants. The process has the stationary distribution defined by

(7) nCO) = , Jl it + Jl

and n( l ) = A .

A + Jl

In Chapter 3 we discussed birth and death processes defined on 0 < t .. ::::: 00 . Actually in the positive recurrent case� it is possible to construct a corresponding process on - 00 < t < 00 having the stationary distribution determined by (7) . This process will be such that

(8) P(X(t) = 0) = Jl A. + Jl

and

and such that the Markov property

P(}{(t) = 1) = A ,

A + Jl

- 00 < t .< 00,

(9) P(X(t) = Y I Xes) = x) = PXy(t - s), - 00 < s < t < 00,

4. 1. Mean and covariance functions 1 15

holds, where PXy(t), t ;;::: 0, is given by (6) . We will show that such a process is a second order stationary process and find its mean and covariance functi ons.

�rhe mean function is given by

f1x(t) = EX(t) A = o · P(�C(t) = 0) + 1 · P(X(t) = 1) = .

A + f1

Let - 00 < s < t < 00. Then

EX(s)X(t) = P(X(s) = 1 and X{t) = 1)

= P(X(s) = l)P(X(t) = 1 I Xes) = 1)

= P(X(s) = 1)P1 1(t - s)

= _ _ A_ ( A + _Il_ e- (l+ Il)(t- S»)

A + f1 A + f1 A -f- f1

= ( A ) 2 +

Af1 e- (l+ Il)(t - s). A + f1 (A + f1)'2

It follows that

r (s t) = Af1 e- (l+ Il)(t- s) x , (A + �U)2 '

By symmetry we see that

r (s t) = �u_ e- (l+ Il) l t- s l x , (A + f1)2

'

- oo < s < t < oo .

"- 00 < s, t < oo .

Thus X(t), - 00 < t < 00, is a second order stationary process having mean Aj(A + f1) and covariance function

r (t) = _�L e- (l +Il) l t l x (A _I- f1)2 ' -· 00 < t < 00 .

()ther interesting exanlples of second order processes can be obtained from Poisson processes.

IExample 3. Consid(�r a Poisson process �r(t ), - 00 < t < 00, with parameter A (see Section 3 .2.2) . This proc�ess satisfies the following properties :

(i) X(O) = o.

(ii) X(t) - Xes) has a Poisson distribution with mean A(t - s) for s < t.

(iii) X(t2) - X(t1), X(t3) - X(t

2), . . . , X(t,.) - X(tn - 1) are indepen

dent for t1 < t2 �:; · · · < tn.

1 16 Second Order Pro,cesses

We will now find the mean and covariance� function of a process X(t), -. 00 < t < 00 , satisfyllng (i)-(iii). It follows from properties (i) and (ii) that X(t) has a Poisson distribution with mean At for t > 0 and

_. X(t) has a Poisson distribution with mean A( - t) for t < O. Thus

- oo < t < oo .

Since the variance of a Poisson distribution ��quals its mean, we se�e that X(t) has finite second nloment and that Var ]t'{t) = Al t l . Let 0 < s < t. Then

COy (X(s), X(s)) = Var X(s) = As. It follows from properties (i) and (iii) that X(s) and X(t) - X(s) are independent, and hence

Thus cov (X(s), X(t) - X(s)) = o.

COy (X(s), X(t)) = COy (X(s), X(s) + X{t) - X(s))

= COy (X(s), X(s)) + COy (X(s), X(t) - X(s))

= As.

If s < 0 and t > 0, th��n by properties (i) and (iii) the random variables X(s) and X(t) are independent, and hence

COy (X(s), X(t)) = O.

The other cases can be handled similarly. We� find in general that

(10) ( ) _ {A min ( l s i , I t l), rx s, t - 0 ,

st > 0, st < o.

The process from Example 3 is not a second order stationary process. In the next example we will consider a closely related process whi(�h is a second order stationary process.

Example 4. Let X{t), - 00 < t < 00, be a Poisson process: with parameter A. Set

Y(t) = X(t + 1) - X(t), -- 00 < t < 00 .

Find the mean and covariance function of the Y(t) process, and show that it is a second order stationary process.

Since EX(t) = At, it follows that

EJt!(t) = E(X(t + 1) - X(t))

= A( t + 1) - At = A,

4. 7. Mean and covariance functions 7 7 7

so the random variables Y(t) have common mean A. To compute the covariance function of the Y(t) process, we observe that if I t - sl > 1 , thc�n the random variables X(s + 1 ) - X(s) and X(t + 1 ) - X(t) are independent by property (iii). Consequently,

ry(s, t ) = 0

Suppose s < t < s + 1 .. Then

for I t -- s l > 1 .

cov (Y(s), Y(t)) = cov (X(s + 1 ) - X(s), X(t + 1 ) - X(t))

= cov (X(t) - X(s) + X(s .+ 1 ) - X(t), X(s + 1 )

-X(t) + X(t + 1 ) - X(s -t- 1 )) .

It follows from property (iii) and the assumptions on s and t that

cov (X(t) - X(s), X(s + 1) - X(t)) = 0,

cov (X(t ) -- X(s), X(t + 1 ) - l�(s + 1 )) = 0, and

cov (X(s + 1 ) -- X(t), X(t + 1 ) - J{(s + 1)) = O.

By property (ii)

cov (X(s + 1) - X(t), X(s + 1) - X(t)) == Var (X(s + 1) - X�(t))

== A(S + 1 - t ). Thus

cov ( Y(s), Y(t)) = A(S + 1 - t).

By using symmetry we find in general that

( ) { A( 1 - I t - s /), ry s, t =

0 , I t - s l < 1 , I t - s l > 1 .

Thus Y(t), - 00 < t < 00, is a second order stationary process having mc:�an A and covariance function

( ') _ { A( 1 - I t / ), ry t -J 0 , I t I < 1 , I t I > 1 .

In Figure 1 we have graphed the covariance function for three different sec�ond order stationary processes. These covariance functions are special cases of those found in Examples 1 , 2, and 4 respectively. In each case rx(O) = 1 and hence rx(t) is equal to the correlation between X(O) and X(t). In the top curve of Figure 1 we see that the correlation oscillates between - 1 and 1 . In the middle curve th�� correlation decrease:s exponentially fast as I t I � 00 . In the bottom curve the correlation decreases linearly to zero as I t I increases from 0 to 1 and remains zero for all larger vallues of I t I .

7 78

Figure 1

Second Order Pro1cesses

'x ( t) == cos t

...........

, ( ) _ ) 1 - I t l , 1 tl � 1 'x t - 1 0 , 1 tl � 1

Consider two second order processes X(t), t E T, and Y(t), t E T. Their cross-covariance function is defined as

rXy(s, t ) = cov (X(s), Y(t)), s, t E T. Clearly

and rxx(s, t ) = rx(s, t) .

The cross-covariance function can be used to find the covariance function of the sum of two procc�sses. Indeed,

rx+ y(s, t ) = cov (X(s) + Y(s), X(t) + Y(t)) = rxx(s, t) + rXy(s, t ) + ryx(s, t) + ryy(s, t),

which can be rewritten as

(1 1) rx + y(s, t ) = rx(s, t) + rXy(s, t) + ryx(s, t ) + ry(s, t) .

4.2. Gsus'sian processes 1 19

In the important case vvhen the cross-covariance function vanishes, ( 1 1) reduces to

( 12) rx+ y(s, t) = rx(s, t) + ry(s, t) .

These formulas are rc�adily extended to SU1tnS of any finite number of processes. Consider in particular n second order stationary prolcesses Xll (t ), - 00 < t < 00, . . . , Xn(t), - 00 < t < 00 , whose cross-covariance functions all vanish. Then their sum

- 00 < t < 00,

i s a second order stationary process such that

(13)

and n

(14) rx(t) =: � rXk(t), - 00 .< t < 00 . k = 1

Example 5. Let Z1 1 ' Z1 2 , Z2 1 , Z22, ' . . , Znl ' Zn2 be 2n independent normally distributed random variables each having mean zero and such that

k: = 1 , . . . , n.

Le�t A1 , • • • , An be real constants and set n

X(t) = � (Zk l ICOS Akt + Zk2 sin Akt}, - 00 < t < 00 . k = l

Find the mean and covariance functions of X(t), - 00 < t < 00 .

Set

It follows from the independence of the Z's that the cross-covariance function between Xlt) and Xj(t) vanishes for i #; j. Thus by using (1 3) and (14) together with the results of Example 1 , we see that X(t), - 00 <

t .. ::::: 00, is a second order stationary proces.s having mean zero and covariance function

n (1 5) rx( t) = :� C1� cos Akt, - cx) < t < 00 .

k = l

4.:2 . Gaussian prOCEtSSes

.A stochastic process �f (t), t E T, is called a Gaussian process if levery finite linear combination of the random variables X(t), t E T, is norJmally

120 Second Order Pro(�esses

distributed. (In this context constant randorn variables are regarded as normally distributed with zero variance.) Ciaussian processes ar(� also called normal processes, and normally distributed random variabl�es are sometimes said to hav,e a Gaussian distribution. If X(t), t E r, is a Gaussian process, then for each t E T, X(t) is normally distributed and, in particular, EX2(t) < 00 . Thus a Gaussian process is necessarily a s�econd order process. Gaussian processes have many nice theoretical properties that do not hold for s(�cond order processes in general. They arc� also widely used in applications, especially in engineering and in the physical SCIences.

Example 6. Show that the process X(t), - 00 < t < 00, from Example 1 is a Gaussian process.

To verify that this is a Gaussian process, wre let n be a positive integer and choose real numbers t l ' . . . , tn and al ' . . . , an. Now

X(t) = Zl cos At + Z2 sin At,

where Zl and Z2 are independent and normally distributed. Thus

a1X(t1) + · · · + anX(tn)

= Zl(a1 cos Atl + · · · + an cos Atn) + Z2(a1 sin Atl + · · · + an sin Atn)

is a linear combination of independent normally distributed random variables and therefore is itself normally distributed.

It is left as an exercise for the reader to show that the process in Example 5 is also a Gaussian process.

Two stochastic processes X(t), t E T, and ]( t) , t E T, are said to have thc� same joint distribution functions if for every positive integer n and every choice of t l ' . . . , tn, all in T, the random variables

ha.ve the same joint distribution function as the random variables

,One of the most useful properties of Gaussian processes is that if two such processes have the same mean and covariance functions, then they also have the same joint distribution functions. We omit the proof of this result. To see that the Graussian assumption is necessary, observe that the process defined in Exercise 1 5 has the same mean and covariance functions as that from Example 1 with q2 = 1 but not the same joint distribution functions.

4. 2. Gau�.sian processes 121

The mean and covariance functions can also be used to find the higher moments of a Gaussian process.

Exam ple 7. Let X(t), t E T, be a Gaussian process having zero rneans. Find EX4(t) in terms of the covariance function of the process.

We recall that if X is normally distributed �rith mean 0 and variance (12 , then EX4 = 3(14. Sincle X(t) is normally distributed with mean 0 and variance rx(t, t), we see that

Let n be a positive integer and let Xl ' . . . , Xn be random variables. They are said to have a joint normal (or Gaussian) distribution if

is normally distributed for every choice of the constants at , . . . , an' A stochastic process X(t), t E T, is a Gaussian process if and only if for every positive integer n and every choice of t 1 , . . . , tn all in T, the random va.riables X(t 1 ) , . . . , X(tn) have a joint normal distribution.

Let Xl ' . . . , Xn be random variables having a joint normal distribution and a density / with res.pect to integration 011 Rn. (Such a density exists if and only if the covariance matrix of Xl , . . . , Xn has nonzero determinant.) It can be sho�'n that / is necessarily of the form

( 16) f(X1 ' · · · , xn) = /2 1 1 / 2 exp [ - !(x - p)'r.- 1 (X - p)] , (2n)n (det 1:) where 1: is the covarian(�e matrix

L =

[COY (X 1 '; X 1) · · · COY (X 1 '

; Xn)]

, ICOV (Xm X 1 ) • • • COV (Xn' Xn)

x and p are the vectors

:x: = [�1]

, P = [�1]

, Xn l.ln

and ' denotes matrix transpose. In particular, if n = 2, then ( 1 6) can be written as

( 17)

122

where

Q(Xl ' x2) = (1 � p2)

Second Order Proc,esses

x [ (Xl � � lr - 2p (Xl � �l) (X2 � �2) + (X2 � �2rJ . He:re J.ll and C1i denote the mean and variance of Xl' J.l2 and C1� denote the me:an and variance of X:2 ' and p denotes the (�orrelation between Xl and X2 • One can also use ( 1 6) to show that the conditional expectation of Xn given Xl ' . . . , Xn- l is a linear function of these n - 1 random variables, i .e. , that

jE'[Xn I Xl = Xl ' . · . , .Xn - l = xn- l] = a + blxl + · · · + bn- lXI1I - l for suitable constants a, bl , · . . , bn- l •

i.� stochastic process X(t), - 00 < t < eX) , is said to be strictly stationary if for every nUl1rlber 't' the stochastic process Y(t), - 00 < t <::: 00 , de1ined by

Y(t) = X(t + 't'), - 00 <::: t < oo ,

has the same joint distribution functions as the X(t) process. A strictly stationary process need not have finite second moments and hence ne��d not be a second order process. It is clear, however, that if a strictly stationary process does have finite second moments, then it is a second order stationary process. The converse is not true in general. It is lc�ft as an exercise for the readIer to demonstrate by an example that a se:cond order stationary process need not be strictly stationary.

]�et X(t), - 00 < t < 00, be a second order stationary process which is also a Gaussian process. Then this process is ne:cessarily strictly stationary. For if T is any real nUIJnber, then the Y(t ) process defined by Y(t) = X(t + 't'), - 00 < t < 00 , is a Gaussian proc:ess having the same lnean and covariance functions as the X(t) process. It therefore has the same joint distribution functions as the X(t) process • .

Since the processes in JExamples 1 and 5 are IGaussian and second order stationary, they are also strictly stationary. The second order stationary processes from Examples 2 and 4 are not Gaussian, but it can be shown that they too are strictly stationary.

It has long been knovvn from microscopic observations that particles suspended in a liquid ar�e in a state of constant highly irregular motion. It gradually came to bt:� realized that the ca.use of this motion is the bOInbardment of the particles by the smaller invisible molecules of the

4.3. The Wiener process 723

liquid. Such motion is called "Brownian motion," named after one of the first scientists to stUldy it carefully.

Many mathematical models for this physical process have been proposed. We will now describe one such model . Let the location of a particle be described by a Cartesian coordinate system whose origin is the location of the particlc� at time t = O. Thc�n the three coordinates of the position of the particle vary independ,ently, each according to a stochastic process Wr(t), - 00 < t < 00 , satisfying the fol1lowing properties :

(i) W(O) = O. (ii) W(t) - W(s) has a normal distribution with mean 0 and variance

(12(t - s) for s :5; t. (iii) W(t2) - W(t 1) , W(t3) - W(t2) , . . . , W(tn) - W(tn - 1) are: inde

pendent for t 1 <; t2 � · · · < tn·

H:ere (12 is some positiv,e constant. Property (i) follows from our choice of the coordinate system., Pro

pc�rties (ii) and (iii) are plausible if the motion is caused by an extremely large number of unrelated and individually negligible collisions which have no more tendency to move the particle in one direction than in the o1PPosite direction. In particular, the central limit theorem makes it re:asonable to suppose that the increments W(t) - W(s) are normally distributed.

This model was initiated, in a different forml, by Albert Einstein in 1 905. H:e related the parame:ter (12 to various physical parameters including Avogadro's number. Estimation of (12 together with other measurt:�ments in a scientific experiment conducted shortly thereafter led to an estimate of Avogadro's number that is within 19 perce�nt of the presently ac�cepted value. Einstein's work and its experimental confirmation gave added evidence for the atomic basis of matter, whic:h was still being questioned at the turn of the century.

Although the mathenlatical model is reasonable and fits the t:�xperim.ental data quite well, it has certain theoretical deficiencies that ,�ill be discussed in Section 5.3. In Chapter 6 we will discuss another mathenlatical m.odel for the physical process.

A stochastic process W(t) , - 00 < t < 00 , satisfying propertiles (i)(Hi) is called the Wiener process with para.meter (12. Mathematicians Norbert Wiener and Paul Levy developed lTluch of the theory, and the process is also known as the Wiener-Levy process and as Brownian motion. The Wiener process is usually assurned to satisfy an additional property involving "continuity of the sample functions," which ,,'e will discuss in Section 5. 1 .2.

124 Second Order Processes

It follows immediately from the properties of the Wiener process that th(� random variables W(t) all have mean 0 and that

The covariance function of the process is

( 19) ( ) { q2 min ( l s i , I t l ), rw s, t = 0 ,

st > 0, st < o.

The proof of ( 19) is virtually identical to that of Formula ( 10) for the covariance function of the Poisson process defined in Example 3. It is left as an exercise for the reader to show that

(20) E( W(s) - W(a»)( W(t) - W(a))

= (12 min (s - a, t -- a), s > a and t > a.

'The Wiener process is a Gaussian process. In other words, if t 1 :� • • •

< tn and b 1 , . . . , b,. are real constants, the random variable

is normally distributed. In proving this result we can assume, with no los.s of generality, that one of the numbers t 1 , . . . , tm say tk, equals zero. Then each of the randol1rl variables W(t1), • . • �, W(tn) is a linear com.bination of the increments W(t2) - W(t1), • • • , W(tn) - W(tn - 1). Indeed, W(tk) = 0,

W{tj) = (W(tk + 1) - W(tk)) + · · · + (W(tj) -- W(tj - 1)),

and

k < j < n,

1 < j < k.

Thus b1 W(t1) + · · · + bn W(tn) can also be written as a linear combination of the increments W(t2) - W(t 1 ), . . . , W(t1ra) - W(tn - 1). Now these increments are independent and normally distributed. Thus any linear cOlmbination of them, in particular,

is normally distributed.

Exercises

1 Let X(t), - 00 <;:: t <::: 00 , be a second ord(�r process. Show that it is a second order stationary process if and only if f.lx(t) is independent of t and 'x(s, t ) depends only on the difference between s and t.

Exercises 725

2 Let X(t), - 00 < t .. < 00, be a second ord�er process. Show that it is a second order stationary process if and only if EX(s) and EX(s)X(s + t ) are both independent of s.

3 Let X(t), - 00 < t < 00 , be a second order stationary process and set yet) = X(t + 1) - X(t), - 00 < t < 00 . Show that the yet) process is a second order stationary process having zero means and covariance function

ry(t) = 2rx(t) - rx(t - 1) - rx(t + 1) .

4 Let X(t), - 00 < t < 00, be a second order stationary process. (a) Show that

Var (X(s + t ) - Xes)) = 2(rx(0) - rx(t)) .

(b) Show that for M > 0

2 P( IX(s + t) - X(s) 1 > M) < -2 (rx(O) - rx(t)) . M

5 Let X(t), - 00 < t < 00, be a Poisson process with parameter A and set yet) = X(t) - tX( I), O < t < 1 . Find the mean and covariance functions of the yet) process.

E. Let U1 , . 0 0 ' Un be independent random variables, each uniformly distributed on (0, 1 ) . Let t/I(t, x), 0 < t < 1 and 0 < x < 1 , be defined by

I/I(t, x) = {�: Then 1 n

X(t) = - � t/I(t, Uk), n k= 1

x < t, x > t.

o < t < 1 ,

is the empirical distribution/unction of U1 , • 0 • , Uno Compute the mean and covariance functions of the X (t) proc��ss.

�, Let X(t), - 00 < t .< 00 , be a second order stationary process having covariance function rx(t), - 00 < t < 00 . Set yet) = X(t + 1), - 00 < t < 00 . Find the cross-covariance function between the X(t) process and the yet) process.

�� Let R and 0 be independent random variables such that 0 is uniformly distributed on [0, 2n) and R has the density

{ r2 e-r2/2a\ IR(r) = (J

0,

o < r < 00 ,

r' < 0 - ,

where (J is a positive constant. It follows by using the change of variable formula involving Jacobians that R cos 0 and R sin 0 are independent

126 Second Order Processes

random variables, each normally distributed with mean 0 and variance (]2 . Let A be a positive constant and set

X(t) = R cos (At + 0), - oo < t < oo .

Show that the X(t) process is a second order stationary process having mean zero and covariance function

- oo < t < oo.

9 Let R1 , • • • , Rm 0 1 , • . • , 0n be independc�nt random variables such that the 0's are uniformly distributed on [0, 2n) and Rk has the density

o < r < 00 ,

r < 0 - ,

where (] 1 , . . . , (]n are positive constants. Let A.1 , . . . , An be positive constants and set

n X(t) = � Rk cos (Akt + 0k). k= 1

Show that the X(t) process is a second order stationary process having mean zero and covariance function

n rx(t) = � (]� cos A.kt. k= 1

1 0 Show that the X(t) process in Example 5 is a Gaussian process.

1 1 Show that the X(t) process in Exercise 9 is a Gaussian process.

1 2 Let X(t), - 00 < t < 00 , be a Gaussian process and let f and 9 be functions from ( - 00 , 00) to ( - 00 , 00) . Show that Y(t) = !(t)X(g(t)), - 00 < t < 00 , is a Gaussian process and find its mean and covariance functions.

1 3 Let X(t), - 00 < t < 00, be a Gaussian process having mean zero and set Y(t) = X2(t), - 00 < t < 00 . (a) Find the mean and covariance functions of the Y(t) process. (b) Show that if the X(t) process is a second order stationary process,

then so is the Y(t) process.

1 4 Let Xl and X2 have the joint density given by (1 7). (a) Find the conditional density of X2 given Xl = Xl . (b) Find the conditional expectation of X2 given Xl = Xl .

1 5 Let Zl and Z2 be independent and identically distributed random variables taking on the values - 1 and 1 each with probability 1/2. Show that X(t) = Zl cos At + Z2 sin At, - 00 < t < 00, is a second order stationary proc:ess which is not strictly stationary.

Exercises 127

In the remaining problems W(t), - 00 < t < 00 , is the Wiener process with parameter (12.

16 Verify Formula (20). 1 7 Find the distribution of W(l) + · · · + W(n) for a positive integer n.

Hint : Use the formulas

n{n + 1 ) 1 + 2 + - - - + n =

2 and 1 2 + 22 + _ _ _ + n 2 =

n( n + 1 )(2n + 1 ) . 6

1 S1 Set

X(t) = W et + 8) - Wet) , - 00 < t < 00 ,

8

where 8 is a positive constant. Show that the X(t) process is a stationary Gaussian process having covariance function

1 �. Set

{(12 (1 _ l!1) rx(t) = 8 8

'

0,

I t I < 8,

I t I > 8.

X( t) == e - tZt W ( e2tZt) , - (X) < t < 00 ,

where (X is a positive constant. Show that the X (t) process is a stationary Gaussian process having covariance function

rx(t) = a2e-al t l , - 00 < t < 00 _

20 Find the mean and covariance functions of the following processes : (a) X(t) = ( W(t» 2, t > 0 ; (b) X(t) = t W(I /t) , t > 0 ; (c) X (t) = c - 1 W ( c2 t), t > 0 ; (d) X(t) = W(t) - tW( I), 0 < t < 1 .

5

Continuity, Integration, and Differentiation of Second Order Processes

In this ��hapter we will study integration and differentiation of continuous

parameter second order processes. We will see that the Wiener process is not

differentiable in the ordinary se�nse, but leads to a new type of process called

"white noise."

5."1 . Co nti nu ity assu m ptions

In dealing with continuous parameter sec:ond order processes, it is

customary to assume that their mean and covariance functions are

continuous and also to nrlake some assumptions concerning the continuity

of the process itself.

5.11 .1 . Cont i n u ity olf the mean and cova ria nce fu nctiions.

Let X(t ), t E T, be a continuous parameter second order process. By

dejfinition then, T is an interval having positivc:� length. We assume in this

chapter that

(i) Ilx(t ), t E T, is a continuous function of t

and that

(ii) rx(s, t ), s E T and t E T, is jointly continuous in s and t.

These assumptions are satisfied in all the examples of the previous chapter

and in virtually all other examples arising in practice.

i�ssumptions (i) and (ii) have the interesting consequence that the

process X(t ), t E T, is continuous in mean square, i .e. , that

(1) lim E(X(s) - X(t))2 = 0, t E T. s-+t

116

5. 1. Con��inuity assumptions

To verify (1), write

jC(X(s) - X(t))2 = [.��(X(s) - X(t))] 2 + 'Var (X(s) - %(t))

= [.��X(s) - EX(t)] 2 + "'far X(s)

-- 2 COy (X(s), X(t)) -t- Var X(t)

129

= (Itx(S) - Ilx(t))2 + rx(s, s) - 2rx(s, t ) + rx(t, t ).

It follows from (i) that

lim (��x(s) - IlX(t))2 = 0, t E T. S -+ t

It follows from (ii) that rx(s, s) and rx(s, t) approach rx(t, t) as S � t, and hence that

lim (rx(s, s) - 2rx(s, t) + rx(t, t)) = O. S-+t

Equation (1) follows im:mediately from these results. Let Y(t), t E T, be another continuous parameter second order p1rocess

satisfying (i) and (ii) . l"hen the cross-covariance function rXy(s, t), S E T and t E T, is jointly continuous in s and t. In other words,

(2) lim COy (X(u), Jt'"(v)) = COy (X(s), Y(t)), S E T and t E T. U -+ S,v-+ t

To verify (2) we writ��

coy (X(s), Y(t)) = EX(s)Y(t) - Ilx(s)lly(t ) and COy (X(u), Y(v)) = EX(u)Y(v) - Ilx(u)IlY(v) .

The difference between these two covariances can be written as

(3) COy (X(u), Y(v)) - COy (X(s), Y(t))

== E(X(u) - X(s)) Y(v) + EX(s)(Y(v) - Y(t))

- (Ilx(u) - Ilx(s))IlY(v) - Ilx(s){tty(v) - .""y(t)).

It follows from (i) applied to the two process��s that

(4) lim (Ilx(u) - Ilx(s))IlY(v) = 0 u -+ si,v-+t

and

(5) li1m Ilx(s)(IlY(v) - Ily(t)) = O. v--+ t

By Schwarz's inequality

[E(X(u) - X(s))Y(V)] 2 < E(X(u) - X(s))2E( Y(V))2

= E(X(u) - X(s))2(ry(v, v) + (tty(V))2).

130 Continuity, IntEPgration, and Differen�'iation

TI1US by (1) and by (i) a.nd (ii) applied to the Y(t) 'process,

(6) li1tll E(X(u) - X(s» Y(v) = o. U-+s,v-+t

By a similar argument

(7) lilm EX(s)(Y(v) - Y(t» = O. U-+s, ll1-+t

Equation (2) follows immediately from (3)-(7).

5.1 .2. Continu ity o1f the sample fu nctions. The random. variables X(t ), t E T, are d,efined on some fixed probability space O. Temporarily we will use the notation X(t, ro) to denote the dependence on both t and ro. For each ro E !l the function X(t, ro}, t E T, defines a real-valued function of t, called the sample function of the process. Thus every 0) E 0 is associated with a unique sample function, and we can think of a stochastic process as a random sample function.

The sample functions from Example 5 of Chapter 4 all satisfy the assumption

(iii) X(t, ro), t E T, is a continuous function of t.

Assumption (iii) is certainly reasonable in models of "continuously varying" physical processes such as Brownian motion. Portions of typical salmple functions of the processes in Examph�s 2, 3, and 4 of Chapter 4 arle shown in Figure 1 ,. The sample functions of these integer-valued processes are not continuous. They are, ho�{ever, piecewise continuous and at points of discontinuity take, on their right-hand limit. Specifilcally, th�ey satisfy the followinJg three assumptions :

(iv) X(s, ro) has a finite limit as s approacbes t from the left ;

(v) X(s, ro) � X(t, (0) as s approaches t friQm the right ;

(vi) the function X(t, ro), t E T, has only a finite number of points of discontinuity on any closed, bounded subinterval of T.

In many contexts we can determine directly only that there is an event 01 c 0 such that P(Ol) = 1 and (iv)-(vi) hiQld for all ro E 01 • In this case we say that the sample functions satisfy (iv)-(vi) with probability one. If 'we now replace the probability space 0 by �tl ' then (iv)-(vi) hold for all ro; similar remarks hold for assumption (iii) . In effect we "throw out" an unwanted set of probability zero. Since tbis does not affect the joint distributions or the mean and covariance fun�ctions of the process, there is no reason for us to distinguish properties that hold "with proba.bility onle" from those that hold "for all ro E 0."

5. 1. Continuity assumptions

...., I I

n I I I I

i I

,-J I

,...-J I

----I

---, I I I I

FiguJ"e 1a

! I

,.---I i

,..--I I ,

Figure 1b

r---, rI r-l l I I I I I I I I I I I I I I

Example 2 of Chapter 4

I � I !

I !

r---' I

r' I

,....---J I

Example 3 of Chapter 4

X (t) � � � I I I I I J

� � I"I � ,.., i I ,., L.......J L., ! I I I I I I I I I I I

,..,j LJ I..-J L.....J � � L, .---J � I I I I

X(t)

X(t)

13 1

r I I

t

t

________________ �b. __ �----��---------�-----------�I----------t

Figure 1c Example 4 of Chapter 4

.I.&... precise discussion of sample function continuity for Gaussian processes is much more complicated than that for integer-valued processes. It has been shown that the sample functions of a continuous parameter

Continuity, Inte�,ration, and Differenti�ltion

Gaussian process either are continuous with probability one or with probability one are so highly discontinuous as to be unlikely to arise: in a practical problem. There are sufficient conditions known on a rnean function and a covariance function which guarantee that there is a Gaussian process having that me:an and that covariance function and having continuous sample functions. In particular, it c:an be shown that ther�� is a process having the properties of a Wiener process and having continuous sanlple functions. Norbert Wiener first proved this result. Several proofs are now known, but they are all too difficult to be included in a book at this. level.

In the future, by the "VViener process with parameter q2, we will rnean a process satisfying properties (i)-(iii) of Section 4.3 and having continuous salllple functions.

5.2:. I ntegration

In this section we will obtain formulas for thc� means and covariances of random variables and processes defined in terms of a continuous paranleter second order process by means of integration. For a simple illustration wh(�re integration of processes arises naturally, consider a second order stationary process X(t), - 00 < t < 00 , having constant but unknown mean 11. Suppose we observe the process on a < t < b and wish to estimate 11. A simple and typically very good estimate is given by

j� = 1 fb X(t) dt. b - a a

Int{�gration of processes will be used later in defining "white noise" and in solving differential equations having "white noise" or other proc�esses as inputs.

l,et X(t), t E T, be a continuous parameter second order process satisfying assumptions (ll) , (ii), and (iv)-(vi) of Section 5. 1 . Let .f(t), a �� t < b, be a piecewise continuous function, where [a, b] is a closed and bounded subinterval of T. For each ro E n the function X(t, ro),

a �� t < b, is piecewise continuous, and hence f(t)X(t, ro), a < t :< b, is also piecewise continuous. Since the ordinary integrals of calculus are well defined for piecewise continuous functions:,

f f(t)X(t, (0) dt

is ",ell defined for each (lJ E Q. By using results from measure theory, it can be shown that this integral, as a function of ro, defines a random

733

variable. It can also be shown that the expectation of this random variable may be found by interchanging the order of integration and expectation. In other words,

(8) E [f f(t)X(t) dt] = f f(t)EX(t) dt

= f f(t)flx(t) dt .

H,ere we have returned to our usual notational convention of omitting the de�pendence of X(t, ro) on roo

Let !(t), a < t < b" and g(t), C < t < d, be two such piec:ewise continuous functions. In order to compute th�e expectation of the ra.ndom variable

Ib Id a f(t)X(t) dt c g(t)X(t) dt,

wc� first rewrite it as the iterated integral

f f(s) (f g(t)X(s)X(t) dt) ds .

It is again permissible to interchange the ordc�r of integration and c�xpectation. Thus

(9) E [f f(t)X(t) dt f g(t)X(t) dt] = f f(s) (f g(t)E[X(s)X(t)] dt) ds.

Using (8) and (9), we conclude that

(10) COy (f f(t)X(t) dt, f g(t)X(t) dt) = f f(s) (f g(t)rx(s, t) dt) ds.

As an illustration of the use of (8) and (10)" let to E T and considler the process Y(t), t E T, defined by

Then by (8)

and by ( 10)

Y(t) = I' X(s) ds, t E T. to

fly(t) = r' flx(S) ds, t E T, Jto

ry(s, t) = rs ( f.' rx(u, v) dV) du, s E T and t E T. Jto • to

134 Continuity, Int6J1ration, and Differentiation

It follows from these formulas that the Y(t) process has continuous sarnple functions and continuous mean and covariance functions.

If X(t), t E T, is a Gaussian process, then

f f(t)X(t) dt

has a normal distribution. In proving this re:sult it is first necessary to approximate the integral by a finite sum such as

b - a f f l(a + (b - a)k) x (a + (b - a)k) . n k = 1 n n

This sum is normally distributed and converges to the integral as n -+ 00. Using this and theorems from advanced probability theory, one can show tha.t the integral is normally distributed.

l\1ore generally, let h(s, t), S E S and a < t �:; b, be such that for each s in the interval S the function h(s, t), a < t < b, is a piecewise continuous function. Let X(t), t E 1"', be a Gaussian process and set

Y(s) = f h(s, t)X(t) dt, S E S.

Th�en Y(s), S E S, is a Ga.ussian process. For l��t S1 , • • • , Sn be in S and let a1 , • • • , an be real numb��rs. Then

a l Y(S l) + . . . + anY(sn) = f Ct akh(Sk, t») X(t) dt

has a normal distribution, according to the prc�ceding paragraph. 'We will now use (9) to obtain a result that will be needed in Section 5.4.

Let W(t), - 00 < t < ()) , be the Wiener proc�ess with parameter (]2'1 and let .r and 9 be continuously differentiable functions on the closed bounded intc�rval [a, b] . We will show that

(1 1) E [f f'(t)(W(t) - W(a» dt f g'(t)(W(t) - W(a» dt] = q2 s: (/(t) - f(b» (g(t) - g(b» dt.

I�or a simple application of ( 1 1), set a = 0, b = 1 , andf(t) = g(t) = t for 0 < t < 1 . We conclude from (1 1 ) that

E (501 W(t) dt) 2 = q2 fol (t - 1)2 dt

3

5.3. Diff.,rentiation 135

The random variable Sal Wet) dt

has mean zero. It is normally distributed, since the Wiener process is a Gaussian process. Thus

J: Wet) dt

is normally distributed ,with mean zero and variance q2/3 . We will now prove that ( 1 1) holds. By (9) and Formula (20) of Section

4.3 we need only show that

(12) J: /'(s) (J: g'(t) min (s - a, t - a) dt) ds

= J: (J(t) - f(b)Xg(t) - g(b» dt.

In verifying ( 12), we write the inner integral of the left side as

f (t - a)g'(t) dt + (s - a) f g'(t) dt.

Integrating the first of these two integrals by parts, we rewrite this expresSIon as

(t - a)g(t) : - f get) dt + (s - a)(g(b) - g(s»

= (s - a)g(b) - f get) dt

= f ( g(b) - get»� dt.

Thus the left side of ( 12) equals

J: l'(s) (f ( g(b) - get»� dt) ds.

Interchanging the order of integration, we get

J: ( g(b) - get»� (f l'(s) dS) dt = J: ( g(b) - g(t» (J(b) - J(t» dt,

which equals the right side of ( 12) .

5. :3. Differentiation

JLet X(t), t E T, be a continuous parameter second order process satisfying (i) and (ii). W�� say that this process is differentiable if there is a

136 Continuity, IntE'gration, and Differen��iation

selcond order process :Y(t ), t E T, satisfying assumptions (i), (ii), and (iv)-(vi) of Section 5. 1 aLnd such that for to E T

X(t) - X(to) = rt Y(s) ds, t E T. Jto The Y(t) process is then called the derivativ�� of the X(t) process and is de:noted by X'(t), t E T. Thus

(13) X(t) - X(to) = rt X '(s) ds, Jto

t E T.

It follows from ( 1 3) that the X(t) process has c�ontinuous sample funlctions and thus satisfies assumption (iii) of Section 5. 1 , and that

� X(t) = X '(t) dt

holds except at the points of discontinuity of the X'(t) process. It also follows from ( 1 3) that

Jlx(t) - Jlx(to) = (' Jlx.(s) ds, Jto and hence that

(14) d /Jx'( t) = - /Jx( t), dt

t E T.

In order to find the covariance function of the X'(t) process, w�� first consider a more general result involving cross-covariance functions. Let X(t), t E T, and Yet), t E T, be two second order processes, and suppose that the X(t) process is differentiable and that the yet) process satisfies (i) and (ii). We will sho'w that

a ryx'(s, t) == - ryx(s, t), at

(15) S E �r and t E T,

and

(16) o rX'y(s, t) == - rXy(s, t), os

S E �r and t E T.

In order to verify (1 5) we choose to E T and write

X(t) - X(to) = (' X '(u) du, Jto

from which it follows that

Y(s)(X(t) - X(to» = rt Y(s)X '(u) du o Jto

5.3. Diff��rentiation

Thus

(17) E[Y(s)(X(t) - X(to»] = t E[Y(s)X '(u)] du o Jto Now

and hence

( 1 8) Jly(S)(Jlx(t) - Jlx(to» = t Jly(s)Jlx'(u) du o Jto

737

By subtracting ( 1 8) from ( 1 7) and rewriting the resulting expression in terms of covariance fun�ctions, we conclude that

( 19) ryx(s, t) - ryx(s, to) = (t ryx'(s, u) du o Jto

Wre saw in Section 5. 1 that a cross-covarianc1e function such as 'YX, (s, u) is necessarily continuous. Thus for each fixed s we can differentiate ( 1 9) with respect to t, obtaining (1 5) as desired. Formula ( 1 6) follows from ( 1 5) by symmetry.

Let X(t), t E T, be a differentiable second order process. FroIn ( 1 5) and ( 1 6) we have

(20) a a

rxx,(s, t) = - rxx(s, t) = -- rx(s, t) at at

and

which imply that the covariance function of the X'(t) process is given by

02 rx'(s, t) = -- rx(s, t),

as 8t (21) S E T and t E T.

Let X(t), t E T, be a differentiable second order process. If the X'(t) process is itself differentiable, we denote its derivative by X"(t), t E T. In this case we say that the X(t) process is twice differentiable and that its second derivative is tht:� X"(t) process. Higher derivatives are sinli1arly dt:�fined.

Let X(t), - 00 < t <: 00 , be a differentiable, second order stationary process. Then flx,(t) = 0, - 00 < t < 00 , and (21 ) reduces to

a2 -'x'(s, t) = -- rx(t - s) = - r;(t - s).

as at

138 Continuity, Integration, and Differentiation

Thus X'(t), - 00 < t <: 00 , is also a second order stationary process in this case and

(22) - oo < t < oo .

W'e see from (22) that the covariance function rx(t), - 00 < t < 00 , is t\\rice continuously differentiable.

Example 1 . Let X(t), - 00 < t < 00 , be a differentiable second order stationary process. Show that X(t) and X'(t) are uncorrelated for al]l t.

From (20)

Thus

(23)

Differentiating the symmetry equation rx(t) = rx( - t), we conclude that

r�(t) = - r�( - t),

and hence, by setting t = 0, that r�(O) = O. It now follows from (23) that

cov (X(t), X '(t)) = rxx,(t, t) = 0,

SOlI that X(t) and X'(t) are uncorrelated. :Let X(t), t E T, be a. differentiable second order process. Then the

random variables

X(t + h) - X(t) h

converge in mean square to X'(t) as h -+ 0 ; that is,

(24) lim E (X(t + h) - X(t) _ X '(t)) 2

= 0, h-+O h t E T.

In (24) it is understood that either T = (- 00, 00) or, for t on the boundary of T, h is restricted to values such that t + h E T.

To verify (24) we write

X(t + h� - X(t) _ X '(t) = � (f+h X'(u) du - hX'(t))

1 it+h = - (X'(u) - X'(t)) duo h t

5.3. Difff..rentiation 139

Thus by (9)

(25) E (X(t + h� - X(t) - X'(t)f 1 (t+ h (' (t+ h ) = h2 Jt Jt E[(X'(u) - X'(t))(X'(v) - X '(t))] dv du o

It follows from Schwarz's inequality that

(26) IE(X'(u) - X'(t))( '(v) - X '(t)) 1

< �E(X'(u) - J('"'(t))2 �E(X '(v) - X· '{t))2.

A1ccording to results in Section 5. 1 , a second order process such as the X'(t) process is continuous in mean square. 'This implies that for jfixed t and fixed 8 > 0 we can :find a � > 0 such that:

(27) E(X'(u) - X'(t))2 < 8 if l u - t l < () .

Lc:�t h be such that - � �::; h < �. Then by (26) and (27)

(28) IE{X'(u) - X'(t))(X'(v) - J('"'(t))1 < 8

as u and v range from t to t + h. We see frolli (25) and (28) that

E (X(t + hh - X(t) _ X '(t)) 2 < 6,

Since 8 can be made arbitrarily small, this implies that (24) holds. Let X(t), t E T, be a differentiable second order process which is also a

Gaussian process. Then X'(t), t E T, is also a �Gaussian process. To verify this result it is necessary to show that if t l ' . . .. , tn are in T and a 1 , . . . , an are real constants, then

n � aiX'(ti) i = 1

is normally distributed. We will indicate why this is the case, but omit a de:tailed proof.

It follows from (24) that

lim E (X(ti + h) - X(ti) _ X '(ti)) 2

= 0, h-+O h i = 1 , . . . , n .

It is not difficult to conclude from this that

(29) 1· E (� ()( ti + h) - X(ti)) � X'( )) 2 - 0 1m "i..J ai - L" ai ti - . h-+O 1= 1 h i = 1


Now for each fixed h th�e random variable

is normally distributed. It can be shown that this, together with (29), implies that

n � aiX '(ti) i = 1

is normally distributed. It may come as a surprise to the reader that the Wiener process is not

differentiable. The simplest way of seeing this is to observe from property (ii) in Section 4.3 that

and hence that

(30)

E (W(t + h) - W(t») 2 = (J2

h I h l '

lim E (W(t + h) - W(t») 2 = 00 . h-+O h

If the Wiener process w��re differentiable, then

lim E (W(t + h) - W(t) _ W'(t») 2 = 0

h-+ O h

would imply that

lim E (J1V(t + h) - W(t») 2 = E(W'(t))2, h-+ O h

which would contradict (30) .

Figure 2

5.4. Whil�e noise 747

There are stronger senses in which the Wiener process is nondifferentiable. It has been shown that 'with probability one the sample function W'(t, ro) ,

_. 00 < t < 00, is not differentiable at even a single value of t. Thus with probability one the sample functions of the Wiener process are continuous, nowhere differentiable functions. The nondifferentiability of the 'Wiener process may seem especially surprising since it arose as a model for the Brownian motion of a particle. But the trajectory of a particle undergoing Brownian motion, observed under a microscope, actually does appear to b(� nowhere differentia.ble. Needless to say, it is difficult to portray a typical sample function of the Wiener process. An attempt at this has b(�en made in Figure 2.

5.4. White noise

Let W(t), - 00 < t .< 00 , be the Wiener process with parameter q2 . Llet a and b be finite numbers and let f be a continuously differentiable function on the closed interval from a to b. Since the Wiener process is not differentiable, the integral

f J(t) W'(t) dt or f J(t) dW(t)

does not exist in the usual sense. Neverth,eless, it is possible to give meaning to this integral. One way of doing so is to define the integral as

!� f J(t) (wet + 8� - wet») dt,

provided the indicated limit exists. To see that this limit does ilndeed exist and to evaluate it �explicitly, we observe that f J(t) (W(t + 8� - wet») dt = f J(t) :

t G f+· W(s) dS) dt.

Integrating the right sidle of this equation by parts, we conclude that

(3 1) f J(t) (W(t + 8� - wet») dt

= [J(t) � f+· W(s) ds J : - f f'(t) G f+· W(s) dS) dt.

Since the Wiener process has continuous sample functions, it follovls that the right side of (3 1) converges to

J(t)W(t) : - f f'(t)W(t) dt.

142 Continuity, Integration, and Differentlation

Thus we are led to define

f f(t) dW(t)

as the limit of the right side of (3 1) as 8 � 0, that is, by the formula

(32) f f(t) dW(t) = f(b)W(b) - f(a)W(a) - f f'(t)W(t) dt.

Note that the right side of (32) is well defined and that (32) agrees with the usual integration by parts formula.

We regard (32) as thc� definition of the int,egral appearing on the left side of (32) . The derivative of the Wiener pro�cess is called "white noise." It is not a stochastic process in the usual sense. Rather dW(t) = W'(t) dt is a "functional" that assigns values to the integral appearing on the left side of (32) . Although V1{e have given white noise a precise definition, it is not clear that it is useful for anything. We willl see in Chapter 6, ho�{ever, that white noise can be used to define certain stochastic differ�ential equations, which are widely used in the physical sciences and especially in certain branches of engineering.

Since the Wiener proc�ess is a Gaussian procc�ss, it follows from (32) that

f f(t) dW(t)

is normally distributed. This random variablle has mean zero, as ,,'e see from (32) and the zero means of the Wiener process. To compute its variance we will show that if a < b and �1 is another continuously differentiable function on [a, b], then

(33) E [f f(t) dW(t) f g(t) dW(t)] = (12 f f(t)g(t) dt.

Setting 9 = f, we see from (33) that for a < b

(34) Var (f f(t) dW(t») = (12 f F(t) dt.

'We start the proof of (33) by rewriting (32) as

(35) f f(t) dW(t)

= f(b)(W(b) - W(a» - f f'(t)(W(t) - W(a» dt.

5. 4. Whitsl noise 143

By applying this formula to 9 as well, we conclude that

(36) E [f f(t) dW(t) f g(t) dW(t)] = E [ (f(b)(W(b) - W(a» - f f'(t)(W(t) - W(a» dt)

x (g(b)(W(b) - W(a» - f g'(t)(W(t) - W(a» dt) ] ·

We: will evaluate the right side of (36) by breaking it up into four separate terlns. The product of the two integrals was computed earlier in Section 5.2,. There we found that

(37) E [f f'(t)(W(t) - W(a» dt f g '(t)(W(t) - W(a» dt] = (12 f (f(t) - f(b» ( g(t) - g(b» dt.

Next we observe that by (20) of Chapter 4

-E [f(b)(W(b) - W(a» f g'(t)(W(t) - W(a» dt]

Consequently

= -f(b) f g'(t)E[(W(b) - W(a» (W(t) - W(a» ] dt

= - (12f(b) f (t - a)g'(t) dt

= - (12j(b) [( t - a)g(t) : - f g(t) dt] = - (12f(b) [(b - a)g(b) - f g(t) dt] ·

(38) -E [f(b)(W(b) - W(a» s: g'(t)(W(t) - W(a» dt] = (12 f f(b)(g(t) - g(b» dt.

144 Continuity, IntE'gration, and Differen.�iation

Similarly we find that

(39) - E [ g(b)(W(b) - W(a)) f f'(t)(W(t) - W(a)) dt] = q2 f g(b)(f(t) - feb)) dt.

Fllnally we note that

E[f(b)(W(b) - W(a))g(b)(W(b) - W(a))] = (12(b - a)f(b) g(b) ,

which we rewrite as

(40) E[J(b)(W(b) - W(a))g(b)(W(b) - W(a))] = q2 f f(b)g(b) dt.

By (36) the sum of the lleft sides of (37)-(40) equals the left side of (33). It is easily seen that the sum of the right sides IOf (37)-(40) equals thc� right side of (33). This prove�s (33).

There are two more formulas that will be nleeded in the next chapter :

(41) E [f f(t) dW(t) f get) dW(t)] = 0, a < b < C < d,

and

(42) E [f f(t) dW(t) f get) dW(t)] = q2 f f(t)g(t) dt, a < b < c.

In these formulas/and ,g are assumed to be continuously differentiable on the indicated intervals of integration.

To verify (41) we use (35) to write

E [f f(t) dW(t) f get) dW(t)] = E [ (f(b)(W(b) - W(a)) - f f' (s)(W(s) - W(a)) dS)

x (g(d )(W(d) - Wee)) - f g'(t)(W(t) - Wee)) dt) ] ·

It follows from Formula ( 1 8) of Chapter 4 that this expectation vanishes. To verify (42) we observe first that

J: get) dW(t) = f get) dW(t) + J: get) dW(t),

5. 4. White noise 145

which is a direct application of the definition of these integrals given in (32). Thus

E [f I(t) dW(t) J: g(t) dW(t)] = E [f I(t) dW(t) (f g(t) dW(t) + f g(t) dW(t)) ] = E [f f(t) dW(t) f g(t) dW(t)]

+ E [f f(t) dW(t) f g(t) dW(t)] , which by (33) and (41 ) equals the right side of (42) .

Example 2. Let X(t), t > 0, be defined by

jlt X(t) =

0 e<lt(t - u) dW(u), o < t < 00 ,

where a is a real constant. Find the mean and covariance function of the X(t) process.

lrhe X(t) process has ze:ro means. For 0 < S < t its covariance funlction is found by (42) to be

E[X(s)X(t)] = E [f: e<lt(s- u) dW(u) I: e«(t - u) dW(U)] = e<lt(s+ t)E [I: e-<ltu dW(u) I: e-<ltu dW(U)]

Thus by symmetry

In lparticular, by setting S' = t, we find that

(12 Var (X(t)) = - (e2fZt - 1),

2a

s > 0 and t > o.

t > o.

Let / be a continuously differentiable function on ( - 00 , b] such that

It can be shown that

foo J(t) dW(t) = a��oo f J(t) dW(t)

exists and is finite with probability one, and that the random variable

foo J(t) dW(t)

is normally distributed with mean zero and variance

Let 9 be a continuously differentiable function on ( - 00 , c] such that

It can be shown that under these conditions (42) holds with a = - 00 ,

I.e. ,

(43) E [f 00

J(t) dW(t) f 00

get) dW(t)] = a2 f�:(b.C) J(t)g(t) dt .

Exa m ple 3. Let X(t), - 00 < t < 00, be defined by

X(t) = foo elZ(t - u) dW(u), - 00 < t < 00 ,

where C( i s a negative constant. Find the mean and covariance of the X(t)

process and show that it is a second order stationary process.

Since

It e2cx(t - u) du = lim It

e2cx(t - u) du - 00 a-+ - 00

a

. 1 - e2cx(t - a) - bm ---- -

11 -+ - 00 - 2C(

1

2(X

Exercises 147

is j5nite, we see from the remarks leading to (43) that the X(t) proc1ess is well defined. It has zero means. For s < t

In general,

rx(s, t) = q2 I� <Xl e,,(,-u)e,,(t - u) du

= u2e«(s+ t) IS e- 2«u du - 00

- 2C(

2 rx(s, t) = � e« lt- s: l .

- 2C(

This shows that the X(t) process is a second order stationary process having covariance function

0'2 rx(t) = - e« lt l ,

- 2C( - 00 <::: t < 00 .

�;ummary. Under appropriate assumptions onl

(44) E [I:<Xl /(t) dW(t) ] = 0,

and under appropriate assumptions on l and Il

(45) E [I:<Xl /(t) dW(t) I:<Xl g(t) dW(t) ] = q2 I:<Xl /(t) g(t) dt.

Most of these "appropriate" assumptions can be eliminated by using :more SOI�histicated concepts. lIowever, for (44) to hold, it is essential that

I: <Xl p(t) dt < 00,

and for (45) to hold, it i s essential that

I:<Xl /2(t) dt < 00 and I:<Xl g2(t) dt < 00 .

Exercises

1 Let X(t), t E T, and ji"(t ), t E T, be continuous parameter second order processes whose mean and covariance functions satisfy conditions (i) and (ii) of Se�tion 5. 1 . Show that the mean and covariance functions of Z(t) = X(t) + Y(t),1 t E T, also satisfy (i) and (ii).

148 Continuity, Intf.�gration, and Differen��iation

�! Find the correlation between Wet) and

J: W(s) ds

for 0 < t < 1 .

�l Find the mean and variance of

4� Set

X(t) = f� W(s) ds, t > o.

Find the mean and covariance function of the X(t) process.

5i Set ft+ 1

X(t) = Jt (W(s) - W(t)) ds, - 00 < t < 00 .

Show that this is a sc�cond order stationary process having mean zero and find rx(t), - 00 < t < 00 .

6; Let X(t), - 00 < t ... ::::: 00 , be a second ordc�r stationary process satisfying assumptions (ii) and (iv)-(vi). (a) Show that

Var (� f: X(t) dt) = � f: rx(t) (1 - �) dt, T > O.

(b) Show that

(c) Show that

Var (� f: X(t) dt) < rx(O) , T > o.

Var (� f: X(t) dt) < i f: I rx(t) I dt,

and hence that if limt-+ oo 'x(t ) = 0, thc�n

lim Var (l (T X(t) dt) = o.

T-+ oo T J o

7 Let X(t), - 00 < t <::: 00, be as in Exercise 6 and suppose that

Exercises

Use the result of Exercise 6(a) to show that

lim T Var (! JOT X(t) dt) = 2 f

oo rx(t) dt = foo rx(t) dt.

T-+ 00 T 0 J 0 - 00 Hint : Observe that for 0 < � < 1

1T t 100 fOO rx{t) - dt < � I rx(t) I dt + I rx{t) I dt. o T 0 6 T

149

8 Let X(t), - 00 < t < 00, be a stationary Gaussian process satisfying properties (i)-(iii) and such that limt-+ oo rx(t) = O. Show that if EX(t) = 0, then

lim E (! fT X 2(t) dt - Var X(O») 2 = O.

T-+oo T Jo

Hint : Use Exercise 6 and Exercise 1 3 of Chapter 4.

9 Let X{t), - 00 < t < 00, be a second order stationary process satisfying assumptions (iv)-{vi) and having constant but unknown mean Jl and covariance function

- 00 .< t < 00,

where a, and f3 are positive constants. For T > 0 set

x = ! fT X(t) dt.

T Jo

(a) Show that X is an unbiased estimator of fl (Le. , EX = fl) and that

(b) Set

A = X(O) + X(T) + f3 J� X(t) dt fl

2 + f3T ·

Show that {l is an unbiased estimator of fl and that

Var (p,) = 2IX •

2 + f3T

It can be shown that (l has minimum variance among all "linear" unbiased estimators of fl based on X(t), 0 < t < T. Since Var (X) is almost as small as Var ({l), the sample mean X is a very "efficient" linear estimator of fl.

150 Continuity, Intf.,gration, and Differen.�iation

(c) Show that

lim Var (Ii) := 1 .

T-+ oo Var (X)

This says that X' is an "asymptotically efficient" estimator of Jl.

1 () Let X(t), - 00 < t < 00 , be an n-times differentiable second order process and let Y(t ), - 00 < t < 00 , be an m-times differentiable second order process. Show that

1 11 Let X(t), - 00 < t < 00 , be an n-times differentiable second order stationary process. Show that x(n)(t ), -- 00 < t < 00, is a second order stationary process and that

rx(ft>(t) = (- I)nr��n)(t) .

1 �! Let X (t), - 00 < t < 00, be a twice differentiable second order stationary process. In terms of 'x(t), - r�.) < t < 00, find : (a) 'xx"(s, t), (b) 'x'x"(s, t), (c) 'x"(s, t) .

1 :1 Let X(t), - 00 < t < 00, be as in Exercise 1 2 and set Y'"(t ) = X"(t) + X(t), - CX) < t < 00 . Show that the Y(t) process is a second order stationary process, and find its mean and cova.riance function in terms of those of the X(t) process.

1 �£1 Find s: c dW(t)

explicitly in terms of the Wiener process.

1 !i Find the mean and variance of

x = f t dW(t) and y = f: t2 dW(t),

and find the correlation between these two random variables.

1 .; Find the covariance function of the X(t) process in each of the following cases :

(a)

(b)

(c)

X(t) = f� s dW(s),

X(t) = fol cos ts dW(s),

X(t) = f- l (t - s) dW(s),

t > O · - ,

- 00 < t < 00 ;

- 00 < t < 00 .

Exercises 151

1 �7 Let X(t), 0 < t < C'() , be the process defined in Example 2 and set

Y(t) = f: X(s) ds, t > o.

(a) Show that

Y(t) = f: e'*-: -

1) dW(u), t > o.

(b) Find Var Y(t).

6

Stochastic Differential Equations, Estimation Theory, and Spectral Distributions

Recall that we introduced th�� Wiener process as a mathematical model for the

motion of a particle subject to molecular bombardment. By using sto(;hastic

differential equations we can find other models for this physical process.

Let X{t) represent the position at time t of a pa.rticle which moves along a

straight line (alternatively, we could let X(t) represent one coordinate of the

position in space). Then X'{t ) and X"(t ) represent the: velocity and acceleration of

the particle at time t. Let m denote the mass of the particle and let F(t ) denote the

force acting on the particle at time t. By Newton's law

( 1 ) F(t ) = mX"(t ).

We will consider three types of forces :

(i) a frictional force -fX '(t), due to the viscosity of the medium, proportional

to the velocity and having opposite direction ;

(ii) a r(�storing force - kX(t), as in a pendulum or spring, proportional to the

distance from the origin and directed toward the origin ;

(iii) an external force C;(t ), independent of the motion of the particle.

In short we consider a total force of the form

(2) F(t ) =: -fX'(t ) - kX(t ) + C;(t ),

where f and k are nonnegative Iconstants. We combine (1) and (2) to obtain the

differentiall equation

(3) mX"(t) + fX'(t ) + kX(t ) =: C;(t ).

Suppose: that the external force is due to some random effect. Then wre can

think of C;( t ) as a stochastic pro�cess. In this case X (t ) is also a stochastic process

and (3) is a stochastic differential equation relating these two processes. If the

external force is due to molecular bombardment, then physical reasoning leads to

the conclusion that this external force is of the form of white noise with a suitable

152

in tro due ti()n 153

parameter q2. In this case X(t) is a stochastic proc��ss satisfying the stochastic differential equation

(4) mX"(t) + fX'(t) + kX(t) = flV'(t ),

where W'(t) is white noise with parameter q2 . In Sec;tions 6. 1 and 6.2, when we discuss differential equations such as (4) which involv(� white noise, we will define precisely v.rhat is meant by a solution to such a differe�ntial equation.

There are areas other than those related directly to molecular bombardmlent of particles �vhere stochastic diff��rential equations involving white noise arise. Consider, for example, the simple� electrical circuit shov.rn in Figure 1 consisting of a

I R

� ( t) L

�Ct----I _�

Figure 1

resistance JR, an inductance L, a capacitance C, and a driving electromotive force �(t) in series. Let X(t) denote the voltage drop across the capacitor at time t. By Kirchhoff's second law X(t) satisfies the differential equation

LCX"(t) -1r RCX'(t ) + X(t) = �(t ).

Even in the absence of a driving electromotive force there will still be a small voltage source known as "therlnal noise" due to the thermal agitation of the electrons in the resistor. Physical reasoning leads to the conclusion that this thermal noise is also of the forn} of white noise. In this case the voltage drop satisfies th(� stochastic differential equation

LCX"(t) .+. RCX'(t) + X(t) = W'(t) .

One can obtain higher order differential equations by considering more complicated ele�ctrical or mechanical systems. We will consider an nth order stochastic differential equation

(5)

154 Stochastic Differential Equ.ltions

where ao, a1 , • • • , an are real constants with ao :1= 0 and W'(t) is white nois(� with parameter (]2.

In Section 6. 1 we will consider (5) in detail for n = 1 . There we will ble able to illustrate the techniques for handling (5) in the simplest possible setting.

In Section 6.2 we will describe the corresponding results for general n, giving the full solution to (5) for n = 2. Vle will also describe \\rhat happens when the: right side of (5) is replaced by a second order stationary process.

In Section 6. 3 we will discuss some elementary principles of estimation theory. We will illustrate these principles by using them to predict in an optimal manner future values of solutions to sto(�hastic differential equations.

In Section 6.4 we will describe the use of Fourie:r transforms in computing covariance: functions of second order stationary proclesses. As an application of these techniques we will comput(� the Fourier transforrn of the covariance fUllction of a second order stationary process arising as a solution to (5).

6.1 . Fi rst order dif1ferential eq uations

In this section we will consider processes vvhich satisfy the first order stochastic differential equation

(6)

where ao and a1 are real constants with ao :1= 0 and W'(t) is white noise with parameter (]2.

For an example of such a process, let X(t) be the position process governed by

mX"(t) + fX'(t) + kX(t) = W'(t).

If there is no restoring force, k = 0 and this equation becomes

rnX"(t) + fX'{t) = »"(t) .

Lc:�t V(t) = X'(t) denote the velocity of thc� particle at time t. Since VJ'(t) = X"(t), we see that the velocity proc�ess satisfies the differential equation

nl V'(t) + fV(t) = W" (t),

which is of the same for:m as (6). Integrating the velocity process rec;overs thle original position process. One can also find an example of a process satisfying a first order stochastic differential (�quation by considering the voltage process when thlere is no inductance in the network.

In trying to find solutiions to (6) we first obs,erve that it is not really well defined, since white noise does not exist as a stochastic process baving

6. 7. First order differential equations 755

sample functions in the usual sense. Ignoring this difficulty for the moment, we "formally" integrate both sides of (6) from to to t and obtain

(7) ao(X(t) - X(to» + al t X(s) ds = W(t) - W(to). Jto

Equation (7) is well defined, since for any point ro E Q, the Wiener process sample function W(t) == W(t, ro) is a well defined continuous function. By a solution to (6) on an interval containing the point to, we llJLean a stochastic process X(t) defined on that interval having continuous sample functions and satisfying (7).

In order to solve (7) we proceed through a series of reversible steps. wr e first rewrite this equation as

X(t) + a 1 Jrt X(s) ds = X(to) _ W(to) + W(t) .

ao to ao ao

M[ultiplying both sides of this equation by e- cr t, where

w�� find that

e-tztX(t) - a.e-tzt rt X(s) ds = (X(to) _ �W(to)) e- lZt + e- lZt

W(t), J� ao ao

which we rewrite as

!! (e-tzt t X(s) dS) = (X(to) _ W(t�) e-tzt + e- lZt W(t).

dt J to ao ao

Integrating both sides of this equation from to to t, we conclude that

e-tzt rt X(s) ds = (X(to) _ W(to)) (e-ato -- e-at) + t e-IZS

W(s) ds, Jto ao C( Jto ao

or equivalently,

t X(s) ds = (X(to) _ W(to») (ea(t- tO) - l�) + eat it e -IZS W(s) ds. Jto ao C( to ao

By differentiation we sec� that

(8) X(t) = (X(to) - W(to») elZ(t - to) + W(t) + � rt elZ(t - s)W(s) ds.

ao ao ao J� Conversely, since these steps are all reversible, 'we see that for any choice of X(to), the right, side of (8) defines a solution to (7). Thus (8) repr1esents the general form of the solution to (7).

756 Stocl'lastic Differential Equlations

By using (32) of Chapter 5, we can rewrite (8) as

(9) X(t) = ��(to)eGt(t - to) + � 1t elZ(t- s) d W(s).

ao to

L��t C be any random variable. The process defined by

X(t) == CeGt(t- to) + � 1t elZ(t - s) dW(s)

ao to

is such that X(to) = C" and hence (9) holds. It is the unique stoc:hastic process satisfying (7) and the initial condition X(to) = C. The randomne�ss of the solution to (6) can be thought of as being caused by both the white noise term in the differential equation (6) and the randomness of the initial condition.

In many applications the initial value X(to) is just some constant Xo independent of (j) E Q. In this case (9) becom.es

( 10) X(t) = xoeGt(t- to) + � 1t elZ(t - s) dW(s).

ao to

This process is a Gaussian process and its mean and covariance funlctions are readily computed. Suppose for simplicity that X(t), t > 0, is the solution to (6) on [0, (0) satisfying the initial condition X(O) = Xo. Then

1 it X(t) = xoeGtt + - eGt(t - s) dJ-J'(s), ao 0

t > o.

Since integrals with respect to white noise have mean zero,

(1 .1) t �: o.

From Example 2 of Chapter 5 and the formula - 2Cla� = 2aoat , vve see that

(12) s > 0 and t > o.

In particular,

t > o.

'We assume throughout the remainder of this section that Cl = - at/ao is negative. Then

( 1 3) Xo(t) = � It e«(t - s) dW(s)

ao - 00

6. 1. First order differential equations 157

is well defined as we saw in our discussion of Example 3 of Chapter 5.

Also for - 00 < t < Cf:.)

Xo(t) = � J'O elZ(t - s) dW(s) + � (t elZ(t - s) dW(s)

ao - 00 ao Jo

= Xo(O)e«' + � rr elZ(t - s) dW(s), ao Jo

which agrees with (9) for to = O. Thus Xo(t), - 00 < t < 00 , satisfies (6) on (- 00, 00). Our goal is to demonstrate that Xo(t) is the only second order stationary process. to do so. We see from Example 3 of Chapter 5 that Xo(t) is a second order stationary process having zero means and covariance function

(14)

- 00 < t < 00.

It is not difficult to sho�r that the Xo(t) process i s a Gaussian process. Let X(t), - 00 < t <: 00 , be any solution to (6) on (- 00, 00). Then

for - 00 < t < 00

and

X(t) = X(O)elZt + � rr elZ(t - s) dW(s) ao Jo

By subtracting the second equation from the 1crrst, we conclude that

(15) X(t) = (X(O) - Xo(O))etzt + Xo(t).

In other words, by letting C denote the random variable X(O) - .Xo(O), w(� find that

(16) X(t) = CefZt + Xo(t).

Conversely, if C is any random variable, then ( 16) represents a solution to (6). For ( 1 5) follows from ( 16) and the remainder of the above steps are reversible. We see, ther1efore, that ( 16) repres1ents the general solution to (6), where C denotes an arbitrary random variable.

We will show next that the X(t) process giv1en by ( 1 6) is a second order stationary process if and only if C = 0 wilth probability one. Since eXlceptional sets of probability zero are of no c�oncern, we can restate this result by saying that the unique second orde�r stationary process 'which

158 Stocl1astic Differential Equ49tions

sa.tisfies (6) on ( - 00 , C�) is given by ( 1 3), 'which has zero means and covariance function givc�n by ( 14).

To verify these results, let C be a random variable such that th�� X(t) process given by ( 1 6) is a second order stationary process. Solving ( 16) for C, we find that

C = e-at(X(t) - Xo(t)) .

It follows (see Exercise 1) that

Ee2 �;; 2e - 2at(E(X( t))2 + l�(X O(t))2) .

Now E(X(t))2 = E(X(0))2 and E(Xo(t))2 = E(XO(0))2, since X(t), - 00 < t < 00, and Xo(t), - 00 < t < 00, are second order stationary processes. Thus

Lc:�tting t -+ - 00 in this inequality, and recallilng that ex < 0, we conclude that EC2 = O. This implies that C = 0 with probability one, as de:sired.

Let X(t) be a second order process satisfying (6) on [0, 00) . Then ( 16) holds on [0, 00), wherle C is a random variable having finite sc�cond moment. Thus for t > 0

E(X(t) -- XO(t))2 = E(Ceat)2 = e2atEC2,

and hence

(17) lim E(X(t) - XO(t))2 = o. t-t· + 00

Since

E(X(t) - XO(t))2 = (EX(t) - EXO(t))2 -r Var (X(t) - Xo(t))

= (EX(t))2 + Var (X(t ) - Xo(t)) > (EX(t))2,

wc:� see from ( 17) that

(18) lim flx(t) = o. t-+ + 00

It follows from ( 17) and Schwarz's inequality (see the proof of Equation (2) of Chapter 5) that

(19) lirn (rx(s, t) - ,.xo(s, t)) = O. s,t-+ + 00

W�e summarize ( 17)--( 19) : any second order process X(t) that satisfi�es (6) on [0, 00) is asymptotically equal to the second order stationary solution Xo(t) of (6) on ( - 00 , 00 ), which has zero mea.ns and covariance function given by ( 14).

6.2. Diff.9rential equations of o�rder n 759

Example 1 . Let X(t), 0 < t < 00, be the solution to (6) on [0, (0) satisfying the initial condition X(O) = xo, where Xo is some real constant. From (1 1), (1 2), and (14), we see directly that

and that

lim Jlx(t) = lim xoecu = 0 t-t· + 00 t-+ + 00

2 lim (rx(s, t) - rxo(s, t)) = lim

- q ea(s + t) = o. s,t-+ + 00 s,t-+ + oc) 2aoa 1

6,.2. Differential eqluations of order n

In this section we will describe the extensions of the results of Section 6. 1 to solutions of nth order stochastic differential equations. Before doing so, however, we will briefly review the deterministic theory in a fornl convt:�nient for our purposes.

Consider the homogeneous differential equation

(20)

where ao, at , . . . , an are: real constants with ao :1= O. By a solution to (20) 011 an interval, we mean a function 4>(t) which is n times differentiable and such that

011 that interval. For each j, 1 < j < n, there is a solution 4>j to the homoge:neous

differential equation on ( - 00, (0) such that

k = j - 1, 0 < k < n - 1 and k :1= j - 1 .

These functions are rc�al-valued. If n = 1 , then 4>t(t) = ea t, where C( = -at/ao. In Section 6.2. 1 we will find formulas for 4>1 and 4>2 when n = 2.

F or any choice of the: n numbers e 1 , . . . , em the function

is the unique solution to (20) satisfying the initial conditions

Wre can write this solution in the form

160 Stoc��Bstic Differential EqU'ations

The polynomial

is called the characteristic polynomial of th,e left side of (20). I�y the fundamental theorem of algebra, it can be fa(�tored as

where r 1 , . . . , r n are roots of the equation p(r) = O. These roots are not nt:�cessari1y distinct and may be complex-valu�ed. If the roots are distinct, then

are solutions to (20), and any solution to (20) can be written as a linear combination of these solutions (i .e. , these solutions form a basis �or the space of all solutions to (20)). If root r i iis repeated ni times in the fa,ctorization of the cha.racteristic polynomial, then

are all solutions to (20). As i varies, we obtain Li ni = n solutions in this way, which again form a basis for the space of all solutions to (20).

The left side of (20) is stable if every solution to (20) vanishes at 00 . The specific form of the solutions to (20) described in the previous para.graph shows that the left side� of (20) is stable if a.nd only if the roots of the characteristic polynomial all have negative real parts.

Consider next the nonhomogeneous differential equation

(22) aox(n)(t) ,� a1x(n - l )(t) + · · · + anx(t) = y(t)

for a continuous function y(t). To find the ge�neral solution to (22) on an interval, we need only find one solution to (22) and add the g�eneral solution to the corresponding homogeneous differential equation.

One method of finding a specific solution to (22) involves the irnpulse response function h(t), t > 0, defined as that solution to the homogeneous diJfferential equation (20) satisfying the initial conditions

h(O) = · · · = h(n- 2)(0) = 0 and h(n - l )(O) = ! . ao

It is convenient to define h(t) for all t by setting h(t) = 0, t < O. It follows from (21) that

The function

t > 0, t < O.

x(t) = (t h(t - s)y(s) ds Jto

6. 2. Differential equations of order n 767

is easily shown to be the solution to (22) on an interval containing to as its left endpoint and satisfying the initial conditions

x(to) = · · · = x(n - l )(to) = O.

Suppose now that the left side of (22) is stable. Then h(t) -+ 0 "exponentially fast" as t � 00 and, in particular,

(23) f�oo I h(t) 1 dt < 00 and

If y(t), - 00 < t < 00 , is continuous and does not grow too fast as t --+ - 00, e.g. , if

then

for all c > 0,

x(t) = foo h(t - s)y(s) ds

= f�oo h(t - s)y(s) ds

defines a solution to (22) on (- 00, 00) . (The reason h(t) is calle:d the "impulse response function" is that if y(t), - 00 < t < 00, is a "unit impulse at time 0," then the solution to (22) is

x( t) = f� 00

h( t - s)y(s) ds = h(t),

so that h(t) is the response at time t to a unit impulse at time 0.)

With this background we are now ready to discuss the nth order stochastic differential equation

where W'(t) is white noise with parameter (}"2 . This equation is not well de:fined in its original form. We say that the stochastic process X(t) is a solution to (24) on an interval containing the point to if it is n - 1 times differentiable on that interval and satisfies the integrated form of (24), namely,

(25) ao(X (n - l )(t) - x(n- l )(to)) + · · · + an- 1(X(t) - X(to))

+ an J't Xes) ds = Wet) - W(to) to

on that interval.

Theorem 1 The .process X(t), t > to, ciefined by

X(t) = (t h(t - s) dW(s), J to

is a solution to (24) on [to, 00) sati�fying the initial conditions

X(to) = · · · = x<n - l )(to) = O.

Proof. This result is just what one would expect knowing the deterministic theory. In our proof we assume for simplicity that to = O. If n = 1 , then h(t ) = ea t/ao and Theorem 1 agr(�es with the results found in Section 6. 1 . We assume: from now on that n ;> 2. Then

X(t) = f� h(t - s) dW(s),

which by Equation (32) of Chapter 5 can be rc�written as

X(t) = h(O)W(t) + f� h'(t - s)W(s) ds .

Since h(O) = O� we see that

(26) X(t) = f� h'(t - s)W(s) ds.

It follows from (26) that

f� X(s) ds = f� (f: h'(s - u)W(u) dU) ds

= f� W(u) ({ h'(s - u) dS) du

= f� W(u)(h(t - u) - h(O» du o

We replace the dummy variable u by s in the last integral, note again that h(O) = 0, and obtain

(27) f� X(s) ds = f� h(t - s)W (s) ds .

In order to find X'(t) from (26), we will use the calculus formula

(28) - f(s�� t) ds = f(t, t) + - f(s, t) ds, d it it a dt to to at

which is a consequence of the chain rule. It follows from (26) and (28) that

X '(t) = h'(O)W(t) + f� h"(t - s) W(s) ds .

6.2. Differential equations of order n

If n > 2, then h'(O) = 0, and hence

X '(t) = f� h"(t - s) W(s) ds.

By repeated differentiation we conclude that

(29) X(j)(t) = f� h(i+ 1 )(t - s)W(s) ds, o < j < n - 1 .

763

Since h(n - l )(o) = l /ao, we find by differentiating (29) with j = n - 2 that

(30) x(n- l )(t) = W(t) + r h (n)(t - s)W(s) ds.

ao J o

From (29) and (30), we see that

(3 1) X(O) = X'(O) = · · · = x(n- l )(o) = O.

It follows from (27), (29), and (30) that

aox(n- 1 )(t) + · · · + an - 1X(t) + an f� X(s) ds

= W(t) + f: (aoh(n)(t - s) + · · · + anh(t - s» W(s) ds.

Since h(t) satisfies the homogeneous differential equation (20), the last integral vanishes, and hence

(32) aox(n - O(t) + . . . + an- 1X(t) + an f: X(s) ds = W(t).

We see from (3 1) and (32) that (25) holds with to = O. This completes the proof of the theorem. I

The general solution to (24) on [to, (0) is gjven by

(33) X(t) = X(tO)4>l (t - to) + . . . + x (n- l )(tO)4>n(t - to)

+ (t h(t _ s) dW(s), t > to . Jto

In more detail, let C 1 , . . . , Cn be any n randolll variables. Then the process X(t), t > to, defined by

(34) X(t) = C1<Pl(t - 10) + · .

, + Cn<Pn(t - to) + (t h(t - s) dW(s) J to is such that

(35)

164 Stocbastic Differential Equrations

and hence (33) holds. This is the unique process satisfying (24) and taking on the initial conditions specified by (35).

Let c1 , C2 , • • • , Cn be n real constants. Then the solution to (24) on [0, 00) having the initia.l conditions

is X(O) = Cl , . . . , x(n - l )(o) = Cn

X(t) = C1CPl(t) + . . . + cnCPnCt) + ft h(t - s) dW(s) . .1 0

Thus X(t) is normally distributed with mean

(36)

and variance

(37) Var (X(t» = (12 I� h2(t - s) ds = (12 I� h2(s) ds.

Suppose now that th�� left side of (24) is stable. Then

(38) Xo(t) = foo h(t - s) dW(s)

is well defined (except on a set of probability zero which we can ignore) and satisfies (24) on ( _. 00, 00). This process is a Gaussian process that has zero means and covariance function fmin(s,t) rxo(s, t) = (12 - <X) h(s - u)h(t - u) du

= (12 I:oo h(s - u)h(t - U) dU o

Thus the process is a selcond order stationary process and

(39) rxit) = (12 I:oo h( - u)h(t - u) du, - 00 < t < 00.

�le will find 'xo(t ) explicitly for n = 2 i n Section 6.2. 1 .) The g��neral solution to (24) on an interval can be written in terms of this process as

(40) where C1 , . . . , Cn are arbitrary random variables. Since 4>l(t), . . . , 4>n(t) all approach zero as t -� 00, it follows from (40) that

lim (X(t) - Xo(t)) = 0 with probability one. t-+ 00

Consider a process X"(t ), - 00 < t < 00, of the form (40). This is a se(�ond order stationary process if and only if C 1 , . . . , Cn each equal

6. 2. DiffE�rential equations of o.rder n 165

ze:ro with probability one. Thus the Xo(t) process given by (38) is the unique second order stationary process that satisfies (24) on ( - 00, (0).

Let X(t) he a second order process that satisfies (24) on [0, (0) , where the left side of (24) is stable. Then this proc��ss can be represented as in (410), where each of the� random variables Cll , . . . , en has finite second moment. It follows easily that

(41) lim E(X(t) - XO(t))2 = 0, t-+· + 00

(42) lim Jlx(t) = 0, t-+ + 00

and

(4:3) lilD (rx(s, t) - rxo(s, t)) = 0. s,t-+ + 00

In other words, any second order process that satisfies (24) on [0, (0) is asymptotically equal to a second order stationary process having zero ml�ans and covariance function given by (39).

We can also consider stochastic differential equations of the forIn

where Y(t) , - 00 < t <: 00, is a second ord(�r stationary process. By a solution to (44) on an interval, we mean an n times differentiable s�econd order process X(t) satisfying (44) on that interval. The results for solutions to (24) extend almost verbatim to solutions to (44) if we replace integrals of thle form

s: h(t - s) dW(s) by s: h(t - s)Y(s) ds,

eX1cept that formulas for the mean and covariance functions of the: X(t) process are different.

In particular, if the left side of (44) is stable, the unique second order stationary process that satisfies (44) on ( - 00, (0) is given by

Xo(t) = foo h(t - s)Y(s) ds = f�oo h(t - s) Y(s) ds.

The covariance function of this process is

rxo(s, t) = f�oo (f�oo h(s - u)h(t - v)ry(v - U) dV) du

or

(45) rxo(t) = f�oo (f�oo h( - u)h(t - v)ry(v - u) dV) du o

166 Stochastic Differential Equ�ltions

The mean function of this process can be obtained by observing that if (44) holds, then

(46) aoJl<;�(t) + alJlC:o- 1 )(t) + · · · + a,nJlxo(t) = Jly(t) .

Since Xo(t) and Y(t) are second order stationary processes, Jlxo(t) and Jly{t) take on constant values /lxo and /ly resp(�ctively. Thus

Jl<X�( t) = · · · = /l'xo( t) := 0,

so we conclude from (46) that

(47)

If Y(t) is a Gaussian process, then so is the Xo(t) process. Finally we can combine the above results in a fairly obvious manner by

considering solutions to the stochastic differential equation

In particular, in the stable case, the stationary solution to the stochastic dHferential equation

where c is constant, is given by

C It Xo(t) = - + h(t - s) dW(s). an - 00

This process has constant mean clan and covariance function given by (39). W1e can regard this setup as a model for an input-output system in ,which th�� input has signal c and noise W'(t) . Then the output has signa.l clan and noise fex> h(t - s) dW(s).

6. :2.1 . The case n = 2 . We will now jfill in some of the details in thc� above discussion for n = 2. In this case (24) becomes

(48)

whose integrated form on an interval containing the origin is

(49) ao(X '(t) - X '(O» + a1(X(t) - X(O» + a2 I� Xes) ds = Wet).

The corresponding homogeneous differential e:quation is

6.2. Diffelrential equBtions of o,.der n

and the characteristic polynomial j s

per) = aor2 + a1r + a2 ·

167

For completeness we willl give formulas for 4>1 and 4>2 and show how these formulas are derived.

JLet us try to find a solution to the homogeneous equation of the form 4>(t) = er t for some cOlllplex constant r. For this choice of 4>(t) wc� find that

ao4>"(t) + al4>'(t) + a24>(t) = aor2erit + a l rert + a2ert

= (aor2 + a1 r + a2)ert

= p(r)ert.

Thus 4>(t) = ert satisfies the homogeneous equation if and only ifp(r) = 0, i .e. , if and only if r is a root of the characteristic polynomial.

In order to obtain specific formulas for 4>1 (t) and 4>2(t) we must distinguish three separate cases corresponding to positive, negative�, and zero values of the discrilllinant of the characteristic polynomial.

C:ase 1 .

�rhe characteristic polynomial has two distin,ct real roots

and

The functions er1t and er2it are solutions to the homogeneous equation as is any linear combination c1er1 t + C2er2t, where Cl and C2 are constants. w�� now choose C1 and C,2 so that the solution

satisfies the initial conditions 4>1 (0) = 1 and ¢I� (0) = o. Since

we obtain two equations in the two unknowns C1 and C2, namely,

C1 + C2 = 1

which have the unique solution

and rl C2 '- ---

r1 - r2

768 Stochastic Differential EqulBtions

Thus

By similar methods we: find that the solution 4>2 to the homogeneous equation having initial c;onditions 4>2(0) = 0 and 4>;(0) = 1 is

Case 2. ai - 4a0t22 < o.

The characteristic polynomial has two distinct complex-valued roots

- a1 + i.J4aoa2 - a� - a 1 - i.J4aoa2 - ai rl = and r2 := -------,

2ao 2ao

In terms of '1 and '2 th�� functions 4>1(t) and ,4>2(t) are given by the same formulas as in Case 1 . Alternatively, using the formula ei8 = cos f} + i sin f} and elementary algebra, we can rewrite these formulas as

4>1(t) = elZt (cos pt - � sin Pt) and

4>it) = 1 elZt sin Pt,

where a and p are real numbers defined by '1 = a + iP or

and P = ../4aOa2 - ai .

2ao

It is clear from these formulas that 4>1 (t) and 4>2(t) are real-valued functions.

lease 3.

'The characteristic polynomial has the unique real root

a1 r1 = - - . 2ao

One solution to the homlogeneous equation is 4>(t) = ertt. A second such solution is 4>(t) = tertt :

aocfl'(t) + a 14>'(t) + a2q�(t) = ao(ri t + 2r1)ertt + a1(r1 t + l)ertt + Gf2 ter tt

= (aori + a1 r1 + ('l2)tertt + (2aOr1 + ajl)ertt

= o.

6.2. Dif�erential equations of ��rder n 169

Thus 4>1 (t) = C1 er1 t + c2ter1 t is a solution to the homogeneous equation for arbitrary constants C1 and C2 . Choosing C1 and C2 so that 4>1 (0) = 1 and 4> � (0) = 0, we find that

4>1(t) = er1t(1 - rl t).

Similarly the solution (P2 satisfying the initial conditions 4>2(0) = 0 and 4>;(0) = 1 is found to be

4>2(t) = ter1 t•

Suppose that the left side of (48) is stable. Then the stationary solution .x"o(t ) to (48) on ( - 00, (0) has the covariance function given by (39) . Since

{ I - 4>2(t), h(t) = ao 0,

t > 0 - ,

t < 0,

w'e can use our formulas for 4>2(t) to COlTlpute 'xo(t). The indicated integration is straightforward and leads to the result that

(50) q2 rxo( t) = 4> 1 ( I t l),

2a 1a2 - C'X) < t < 00,

ill all three cases for n := 2. In particular,

(5 1) - 00 < t < 00 .

Example 2. Consider the stochastic diffe�rential equation

X"(t) + 2X'(t) + 2X(t) := W'(t).

(a) Suppose X(t), 0 < t < 00, is the solution to this equation on [0, (0) having the initial conditions X(O) = 0 and X'(O) = 1 . Find the distribution of X(t) at the first positive time t such that EX(t) = o.

(b) Consider the stationary solution Xo(t), - 00 < t < 00 , to this equation on ( - 00, (0) . Find the first positiv,e time t such that Xo(O) and X"o( t) are uncorrelated.

Since ai - 4aOa2 = 4 - 8 = -4 < 0, Case 2 is applicable. Now

2 a = - - --2

T'hus

- 1 and p = �8 - 4 = 1 . 2

1 70 Stoci1rastic Differential Equ41tions

and h(t) == lP2(t) = e- t sin t, t > o.

The mean and varianc:e of the solution having the initial conditions indicated in (a) are given according to (36) and (37) by

l�X(t) = lP2(t) = e- t sin t

and

Evaluating the last integral, we find that 2

Var (X(t)) =: � [1 + e- 2t(cos 2t - sin 2t - 2)] . 8

The first positive time t such that EX(t) = 0 is t = n. We see that X(n) is normally distributed with mean 0 and variance 0'2( 1 - e- 21t)/8. The covariance function of the stationary solution to the differential equation is given, according to (50), by

Thus the first positive ti1ne t such that Xo(O) and Xo(t) are uncorrelated is t == 3n/4.

6. :3. Estimation theory

In this section we will study problems of the form of estimating a random variable Y by a random variable f, where f is required to be de:fined in terms of a given stochastic process 4X" (t ), t E T. In terms of the probability space Q, we observe a sample function X(t, ro), t E T, and use this information to construct an estimate f (ro) of Y(ro). Estimation theory is Iconcerned with methods for choosing good estimators.

Example 3. Let X(t), 0 < t < 00, be a slecond order process and let o .< to < t1 • The problem of estimating X(t1) from X(t), 0 < t < to , is call1ed a prediction problem. We think of to as the present, t < to as the past, and t > to as th�e future. A prediction problem, then, involves estimating future values of a stochastic proces.s from its past and present values. In the absence of any general theory one can only use some intuitively reasonable estimates. We could, for example, estimate X(t1) by the present value X(to). If the X(t) process is differentiable, we could estimate X(ti) by X(to) + (tl - to)X'(to).

6. 3. EstiTJnstion theory 171

Example 4. Let 8(t), 0 < t < 1 , be a second order process. Let N' (t), 0 < t < 1 , be a. second order process independent of th�� 8 (t) process and having zero means. Problems of estimating some random variable defined in term:s of the 8(t) process based on observation of the process X(t) = 8(t) + N(t), 0 < t < 1 , are called filtering problems. Thinking of the 8(t) process as a "signal" and of the N(t) process as noise, w�e wish to filter out most of the noise without appreciably distorting the signal. Suppose we want to estimate the 8(t) process at some fixed. value of t, say t = !. If the signal varies slowly in time and. the noise osc�illates ra.pidly about zero, it might be reasonable to estimate 8 (!) by

1 ft+ £ - X(t) dt 28 t- s

fQir some suitable 8 betVl{een 0 and t.

We have discussed tw'o examples of estimation problems and described SQime ad hoc estimators. In order to formulate estimation as a precise mathematical problem, we need some criterion to use in comparing the ac:curacy of possible estimators. We will usc� mean square error as our m1easure of accuracy. VVe will estimate random variables Y having finite se1cond moment by random variables Z also having finite second moment. Th.e mean square error of the estimate is E(�� - y)2 . If Zt and �l2 are t,,'o estimators of Y su�ch that E(Zt - y)2 < E(Z2 - y)2, then Zt is considered to be the better estimator.

In any particular estimation problem we nrlust estimate some ra.ndom variable Y in terms of a process X(t), t E T. A random variable 2� is an allowable estimator only if it is defined in tef1ms of the X(t) process. We may further restrict the allowable estimators by requiring that they depend on the X(t) process in some suitably simple manner. In any case we obtain some collection .,I{ of random variables which we consider to be the al1lowable estimators. An optimal estimator of Y is a random variablc� f in .,II/ such that

(52) E(f" - y)2 = min E(Z _ y)2 .

Z e Jl.

The estimators are required to have finite slecond moment, so that (i) if Z is in .,I{, then EZ 2 < 00 •

In almost all cases of interest, .,I{ is such that (ii) if Zt and Z2 ar�� in .,I{ and at and a2 are real constants, then

atZt + a2Z2 is in .,I{. If condition (ii) holds, then .,I{ is a vector space. To verify that optimal estimators exist, it is usually necessary for .,I{ to be such that


(iii) if Z1 , Z2, . . . are in vH and Z is a random variable such that limn-+oo E(Zn - �l)2 = 0, then Z is in vH.

Condition (iii) states that if Z is the mean square limit of random variables in .A, then Z is in .A. In other words, this condition states that Jt is closed under mean squa.re convergence.

Example 5. Li near estimation . Conside:r a second order process X(t), t E T. Let vH 0 be the collection of all random variables that are of thle form of a constant plus a finite linear combination of the random variables X(t), t E T. Thus a random variablt:� is in vH 0 if and only if it is of the form

for some positive integer n, some numbers S1 , . . . , Sn each in T, and some real numbers a, b1 , • • • , bn • The collection .,It'o satisfies (i) and (ii), but it dOles not in general satisfy (iii) because certain random variables involving integration or differentiation, e.g. , X'(t 1) for some t 1 E T, may b(� well defined in terms of the X(t) process but not be in Jt o . Such random variables, however, can be mean square limits of random variables in vii 0

under appropriate conditions, as we saw in S��ction 5.3 . This leads us to consider the collection .A of all random variables which arise as mean square limits of randonl variables in vii o . C:learly vH contains .,Ito. It can be shown that .A satisfies conditions (i) , (ii), and (iii). Estimlation problems involving this (�hoice of vii are called linear estimation problems.

IExample 6. N o n l i near esti mation . Let X(t), t E T, be a s(�cond order process as in the previous example. Let vH 0 be the collection of all random variables having finite second moment and of the form

!(X(S1), . . . , X(sn)) ,

where n ranges over all positive integers, Sl ' . . . , Sn range over T, and / is an arbitrary real-valued function on Rn (subj��ct to a technical condition involving "measurability") . Again vH 0 satisfie:s conditions (i) and (iii) but not necessarily (iii) . The larger collection .Aft of all random variables arising as m,ean square limits of random variables in J( 0 satisfies all three conditions. Estimation problems involving this choice of J( are called nonlinear estimation problems.

"fhe extension from .4:'( 0 to vH in the above two examples is nece:ssary only if the parameter set T is infinite. If T is a finite set, then vH 0 := Jt in these examples.

6. 3. Estilll'Btion theory 173

6.:3 .1 . General pri nc: i ples of esti mation . Most methods for finding optimal estimators are based on the following theorem.

Theorem 2 Let vii satisfy conditions (i) and (ii). Then l' e vii is an optimal estimator of Y if and only if

(53) E(f - Y)Z = 0, Z e vii. � � � � 2 If I and Y are both optimal estimators of }T, then E(Y - I ) =: 0

and hence Y = l' with probability one ; in this sense the optinlal estimator of Y is uniquely determined.

'Two random variables Zl and Z2' each having finite s��cond moment, are said to be orthogonal to each other if EZ1Z2 = O. Theorem 2 asserts that an optimal estimator of Y in terms of a random variable lying in vii is the unique random variable l' in vii such that l' - Y is orthogonal to all the random variablc�s lying in .A (see Figure 2).

Figure 2

JProof Let l' e .A bc:� an optimal estimator of Y and let Z be in Jt. Thien by condition (ii), l' + aZ is in JI. It follows from (52) that

E(l' - y)2 < .E(l' + aZ _ y)2,

In other words the function f defined by

f(a) = E(Y + aZ _ y)2

- oo < a < oo .

= E(f - y)2 + 2aE(f - Y)Z + a2 EZ2

has a minimum at a = O. Thus

o = f'(O) = 2E(l' - l')Z,

which shows that (53) holds.

174 StocJ;sstic Differential Equ,ations

Suppose now that f E= vii and (53) holds. L,et Y be any random variable in. vii. Then

� 2 -(T � i"T 2 E(Y - Y) = E( 1 -- Y + Y - I ) = E(Y -- y)2 + 2E(f - Y)(Y - f) + E(Y - f)2 .

Since Y - Y is in vii, 'we can apply (53) with Z = Y - f to conclude that E(Y - Y)(Y - f) = 0, and hence that

(54) E(Y - J�)2 = E(f - y)2 + E(Y _ y)2. Since Eey - f)2 > 0, (54) shows that f is at least as good an estimator of Y as is Y. Since Y is an arbitrary rando:m variable in Jt, Y is an

� optimal estimator of Y. If Y is also an optirnal estimator of Y, then by (54) we see that E(Y - y)2 = 0. This completes the proof of the th.eorem. I

It can be shown that if Jt satisfies condition (iii) as well as (i) and (ii), then there is always an optimal estimator of }r.

Let X(t), t E T, be a s�econd order process and let vii be as in ExaInple 5. Lc�t Y be the optimal linear estimator of a random variable Y. Sinlce the constant random variable Z = 1 is in vii 0 and hence in vii, it follows from (53) that

(55) E(Y - Y) = 0.

Since the random variable X(t) is in Jt 0 c c/H for t E T,

(56) E(Y - Y)X(t) = 0, t E T.

Conversely, if Y E vii satisfies (55) and (56), then f is the optimal linear estimator of Y. The proof of this result is left as an exercise.

Let X(t), t E T, be a s��cond order process and let Ybe a random va.riable as before. Suppose no�r that for every positive integer n and every (;hoice of Sl ' . . . , Sn all in T, the random variables X(St), . . . , X(sn), Y have a joint normal distribution. It can be shown that in this case the optimal linear estimator of Y and the optimal nonlinc�ar estimator of Y coincide. The proof depends basically on the fact that if X(SI), . . . , X(sn), Y have a joint normal distribution, then

for suitable constants Q, b 1 , • • • , bn•

6.3.2. Some examples of optimal pred iction . We will close this section by discussing some examples of prediction problems in 1which th1e optimal predictor ta.kes on a particularly simple form.

6. 3. Estin1ation theory 775

Example 7. Let W'(t) represent white noise with parameter O�2 and let the observed process �r(t), 0 < t < 00, be the solution to the differential equation

(57) o < t < 00,

satisfying the determinis.tic initial conditions

X(O) = Xo and X'(O) = Vo .

L��t 0 < t 1 < t2 • Find the optimal linear prc�dictor of X(t2) in terms of X(t), O < t < t1 , and find the mean square e:rror of prediction.

As we saw in Section 6.2, we can write the solution to (57) as

(58) X(t) = X(O)cP1(t) + X'(O)cP2(t)

+ f� h(t - s) dW(s), o < t < 00,

where cP 1 , cP2, and h are defined explicitly in Section 6.2. 1 . We have sirnilarly that

X(t) = X(t1)cP1 (t - t1) + X '(t1)cP2(t - t1)

Set

Then

+ f' h(t - s) dW(s), J t t

We will show that.f(t2) is the optimal linear predictor of X(t2) in terms of X(t), 0 < t < t 1 .

We note first that

E(.f(t2) - X(t2)) = -E h(t2 -- s) dW(s) = O. f,t2 tt

By (41) of Chapter 5

E [f� h(t - s) dW(s) {2 h(t2 - s) dW(S)] = 0,

Using (58) and the fact that X(O) and X'(O) have the respective deterministic values Xo and VOl ' we now conclude that for 0 < t < t1

E[X(t)(g(t2) - X(t2» ] = E [ (X04>l(t) + V04>2(t) + f: h(t - s) dW(S») x (-f2 h(t2 - s) dW(S»)] = o.

1 76 Stochastic Differential Equ.stions

Thus, to show that.� (t2) is the optimal linear predictor of X(t2) in terms of X(t), 0 < t < t 1 , it is lenough to show that g(t2) is the limit in mean square of linear combinations of the random variables X(t), 0 < t < t1 . To do this we need only show that X/(tt) is such a limit in mean square. But from Equation (24) of Chapter 5 we sec� that X/(tt) is the lilnit in mc�an square of the random variables

X(tl) - X (tl - �) 1 n

as n � + 00 . This conc�ludes the proof that �� (t2) is the desired optimal pr,edictor of X(t2).

The mean square error of the predictor g (t2) is

or

'There are several worthwhile observations to be made concerning this example. First, g (t), t > t l ' can be uniquely defined as that function which satisfies the homogeneous equation

and the initial conditions

and

Secondly, the mean squa.re error of prediction depends only on the distance between t 1 and t2 and is an increasing function of that distance. Let 8 be any positive number less than t 1 . Then the: predictor g (t 2) is the limit in mean square of linear combinations of the random variables X(t), t1 - 8 < t < t1 • Thus in predicting X(t2) in terms of X(t), 0 < t < t1 , we: need only observe X(t), t1 - 8 < t < t t , for an arbitrary small positive number 8. Finally, since the X(t) process is a Gaussian process, tht� optimal linear predictor g(t2) of X(t2) in terms of X(t), 0 < t < t1 , is also the optimal nonlinear predictor.

6. 4. Spe(�trBI distribution 1 77

The results of Example 7 are readily extend��d to prediction of sto(;hastic processes defined as solutions to differential ��quations of order n having white noise inputs. Suppose that X(t), t > 0" is defined by requiring that

(59)

on 0 < t < 00 , and that X(O), . . . , x(n - l )(o) take on n reSI)ective d��terministic values. Le�t 4>1 ' . . . , 4>n and h be: as in Section 6.2. Th.en for o < t1 < t2 , the optimlal (linear or nonlinear) predictor g(t2) of X(t2) given X(t), 0 < t < t1 , is given by

(6,0) g(t2) = X(t1)¢1 1 (t2 - t1) + . . . + :x�(n - 1 )(t1)4>n(t2 - t1) .

The corresponding function g(t), t > t1 , is the unique function that sa.tisfies the homogeneous equation

(6 1)

and the initial conditions

The mean square error iQf prediction is given by

(63) E(g(t2) - X(t2» 2 = q2 f:2- t1 h2(s) ds.

Suppose now that th�e left side of (59) is stable and let X(t), -· 00 < t .< 00 , be the stationary solution to (59) on (- 00 , 00) . The:n for - 00 < t 1 < t 2 , the optimal (linear or nonlinear) predictor g (t 2) of X(t2) in terms of X(t), .- 00 < t < t 1 , is again given by (60) or (61)-(62), and (63) remains valid.

6.4. Spectra l d istri bution

Let X(t), - 00 < t <: 00, be a second ordler stationary process 'whose covariance function is such that

f� ex> I rx(t) I dt < 00 .

The spectral density junction jX(A) , - 00 < A .< 00 , is defined by

(64) - OO < A < OO.

Techniques involving spectral densities are: widely used in estinlation problems involving second order stationary processes. Though these te��hniques are often easy to implement, a proper understanding of them

178 StocJ,astic Differential Equlstions

re:quires material such as complex variable theory that would take us too far afield to discuss. In this section we will simtply discuss some elem��ntary properties of spectral d(�nsities, find them explicitly in a few special cases, and introduce the more general concept of a spectral distribution fUllction.

Since rx{ - t) = rx{t), sin ).,t rx{t) is an odd function of t, and he�nce

J:oo sin At rx(t) dt = O.

R1ecalling that e - Ut = c�os At - i sin At, we conclude from (64) that

1 roo fx{A) = - J cos ).,t rx{t) dt, 2n - 00

- oo < A < oo.

It is clear that fx is a n�al-valued function which is symmetric about the origin, i .e. , fx{ - ).,) = fx{A). Using the fact that covariance functions are nonnegative definite and approximating integrals by sums, one can show that fx is a nonnegativ�� function. It is also possible to show that fx is integrable on ( - 00 , (0) and that rx is given in terms of fx by the Fourier transform

rx(t) = J: 00

eiA'ix(A) dA, - 00 < t < 00.

Since fx i s symmetric about the origin, this re:duces to

rx(t) = J: 00

cos At fx(A) dA.

In particular,

Var X(t) = rx(O) = J:oo fX(A) dA.

The function Fx, defined by

FX(A) = foo fx(u) du, - co < A < 00,

i s called the spectral (listribution function of the process. It is not a probability distribution function and fx is not a probability density function unless rx{O) = 1 .

Example 8. Let X(t), - 00 < t < 00, b�� the process from Example 2 IOf Chapter 4. Its covariance function is of the form

rx{t) = ae- P 1 t l ,

where a and p are suitable positive constants. Thus

fx().,) = !!:-. foo e- iUe- P l t l dt. 2n - 00

6. 4. Spec'trsl distribution

Now

since

Similarly,

TIlerefore

foOO e- we-{Jt dt = fooo

e - ({J + iA)t dt

t-+ CX)

e- (P + il)t ex> 1 -- (P + iA.) () P + iA. '

t-+ 00

= lim e- ��t = o. t-+ 00

ro . 1

J. e- r,ltePt dt = -_. - . - 00 P - iA.

A. = � ( 1 +

1 ) := a.p ix( ) 2n P + iA. P - iA. n(ft2

+ A. 2) · Consider a stochastic differential equation

179

- 00 < t < 00 ,

whose input i s a second order stationary process and whose left side is stable. As we saw in Sec:tion 6.2, the stationary solution to this differential equation is given by

X(t) = f:oo h(t - s) Y(s) ds,

where h is the impulse rlesponse function. Th(� covariance function of the solution is

(65) rx(t) = f:oo (f:oo h(- u)h(t - v)ry(v - u) dV) du o

The function h is such that

Suppose that

f:oo I h(t) 1 dt < 00 .

f: 00 I ry(t) 1 dt < 00 .

It then follows easily from (65) that

f:oo I rx(t) I dt < (f:oo I h(t)J dtf f:oo I ry(t) I dt < 00 .

180 Stochastic Differential Equcltions

Consequently the Y(t) process and the X(t) process both have spectral density functions. To find the relationship bet'ween these two spectral density functions, we first define the frequency response function H by

H(A) = f: <X> e-Wh(t) dt, - 00 < A < 00 .

W'e will show that

(66) - oo < A < oo .

It is certainly much easier to compute the spectral density of th�� X( t) process by (66) than to compute the covariance function by (65). Of course, if one is ultimately interested in the covariance function rx(t), it is necessary either to use (65) or else to compute the Fourier transform

rx(t) = f:oo eiOjy(A) IH(AW dA.

In many cases this Fourier transform can ble evaluated easily by using complex variable theory.

Formula (66) follows from (65) in a straightforward manner. W�� start with

fx(A) = � foo e- i;ltrx(t) dt 2n - 00

= - e- lAth( - u)h(t -- v)ry(v - u) dt dv dU e 1 foo foo foo . 2n - 00 - 00 - 00

W'e first integrate with respect to t :

Thus

Also,

f:oo e-Wh(t - v) dt = e- iAv J:oo e- iA(t- V)h(t - v) dt

= e- iAv f: <X> e- iAth(t) dt

= e-uVH(A).

fX(A) = H(A) j'oo h( - u) (fOO e-u·'ry(v - u) dV) dUe

2n - 00 - 00

- e- 1AVry(v .- u) dv = -- e- lA(V-U)ry(v - u) dv 1 foo . e-UU foo . 2n - 00 2n - 00

= -- e- lAVry(v) dv e-UU foo . 2n - 00

6.4. Spe(�tral distribution

Consequently,

fxO·) = fy(}..)H(}..) J:oo e- ilUh( - u) du

= fy(}..)H()..) J:oo eilUh(u) du

== fy(A)H(A)H( - A).

It is left as an exercise for the reader to show that

(67)

From this and the prec(�ding result, we obtain (66) as desired.

181

Of course, in order for (66) to be useful we lmust be able to compute the fr�equency response function H. This turns out to be surprisingly easy :

(68) H(A) _ 1

ao(iA)n + a1( iA)n - 1 + · · · + an - oo < A < oo.

We will now prove (68). The impulse response function h is such that if y(t), - 00 < t < 00 , is a bounded continuous function and

x(t) = J:oo h(t - s)y(s) ds,

then

(69) - oo < t < oo .

This is true even if y(t) is a complex-valued function. Choose -, 00 < A < 00 and set

Then y( t) = eilt, - 00 < t < 00 .

x(t) = J:oo h(t - s)y(s) ds

= J: 00

h(t - s)eils ds.

By setting u = t - s w(� conclude that

x(t) = J: 00 h(u)eil(t- u) du = eiltH(}"),

and hence that x(j)(t) = (iA)iH(A)eUt•

Substituting this into (69) we find that

(ao(iA)n + a1 (iA)n - 1 + · · · + an)H(A)eilt = eilt,

which implies that (68) holds as desired.

182 Stoc.hastic Differential Equations

Consider again the stable stochastic differential equation

- 00 < t < 00 ,

w'here W'(t) i s white noise with parameter cr2 . The covariance function of the stationary solution to (70) is given by

(71) rx(t) = (12 f�oo h( - u)h(t - u) du o

It follows easily from this that

f�oo I rx(t) I dt < (12 (f�oo I h(s) 1 dsf < 00 ,

so that the X(t) process has a well defined spectral density function fx(A). It is left as an exercise for the reader to use (7 1) to show that

(72) - oo < A < oo .

Since white noise is not a process in the ordinary sense, its spectral d��nsity function cannot be simply defined as. a Fourier transform of the covariance function. Suppose the spectral density fw,(A) is defined SIO that, in analogy with (66),

(73)

From (72) and (73) we find that

(12 (74) fw,(.A) = - ,

2n

- CX) < A < oo .

- oo < A < oo .

Vle consider (74) as defining the spectral density of white noise. From this dlefinition we see that the spectral density function of white noise is constant over all "frequencies" A. It is this property of white noise that suggests its name.

Let X(t), - 00 < t .. < 00 , be any second order stationary proc{�ss. A re�sult known as Bochne�r's Theorem asserts that if rx(O) > 0, the function rx(t)/rx(O), - 00 < t <: 00 , is necessarily the characteristic function of some probability distribution function Gx(A), - 00 < A < 00 . The function Fx defined for rx(O) > 0 by

- 00 < A < 00 ,

and by Fx(A) = 0 , - (X) < A < 00 , if rx(O) = 0, i s called the Sl?ectral distribution function of the process. Since a probability distribution

6.4. Specltral distribution 183

function is uniquely determined by its charac�teristic function, it follows that Gx, and hence also the spectral distribution function, is uniquely determined by the covariance function. If Fx is of the form

FxO.) = fcxJx(U) du, - CX) < A < 00 ,

for some nonnegative function fx, then fx is called the spectral density function of the process ; in this case

- (X) < t < 00 .

F or the benefit of those readers who are familiar with the Stieltjes int�egral, it should be pointed out that the covariance function can in general be expressed in terms of the spectral distribution function by means of the Stieltjes integral

'x(t) = f�eX) eitA dF xO.), - co < A < 00 .

If f� eX) I rx(t) 1 dt < 00 ,

th(� definitions given here are equivalent to those given earlier in this sec�tion.

IExample 9. Let X(t), - 00 < t < 00 , be: a second order stationary process, such as in Exarnple 1 of Chapter 4, whose covariance function is given by

rx(t) = (12 cos At t ,

where At > O. Find the spectral distribution fiunction of the process.

Suppose first that (12 :> o. Then rx(O) = (12� > 0 and

rx(t) = cos Al t rx(O)

is the characteristic func;tion of a random variable which assigns probability ! to each of the two points - At and At . After multiplying the corresponding probability distribution function by rx(O) = (12 , we find that

- 00 < ;l < -At , -At < A. < At , At < A <: 00 .

Ch�arly this formula is also correct if (12 = o.

184 Stoc'hastic Differential Equations

Exercises

'1 Let X and Y be random variables each baving finite second moment. Show that E(X - y)2 < 2EX2 + 2Ey:2 . Hint : Verify the identity E(X - y)2 + E(X� + y)2 = 2EX2 + 2Ey2.

,2 Let m and f be positive constants and let Vo and Xo be real constants. The process V(t), t > 0, defined as th�e solution to the sto��hastic differential equation

m V'(t) + fV(t) = W'(t),

is known as Langevin's velocity process. (a) Express this velocity process in terms of white noise. (b) Find its mean and covariance function. The process X(t), t > 0, defined as th�� solution to the sto�:hastic differential equation

mX"(t) + fX'(t) = W'(t), X(O) = xo, X'(O) = vo,

is called the Ornsteln- Uhlenbeck process. (c) Express the Ornstein-Uhlenbeck proc��ss in terms of white noise. (d) Express the Ornstein-Uhlenbeck pro�cess in terms of Langevin's

velocity process,. (e) Find the mean and variance of the Ornstein-Uhlenbeck prolcess at

time t.

:3 Let m andfbe positive constants and let J1o(t), - 00 < t < 00 , be the stationary solution to the stochastic differential equation

m V'(t) + fV(t) = W'(t).

(a) Express Vo(t) in terms of white noise. (b) Find its mean and covariance function. (c) Show directly that

lim (ry(s, t) - ryo(s, t)) = 0, s,t-+ + 00

where V(t), t > 0, is from Exercise 2"

(d) Set

Xo(t) = I� Vo(s) ds, t > 0.

Show that the X o( t ) process satisfies the stochastic differential equation

m��"(t) + fX'(t) = W'(t),

(e) Express Xo(t) in terms of white noise. (f) Find the mean and variance of Xo(t).

t > 0.

Exercises 185

4 Is there a stationary solution to the stochastic differential equation

on - 00 < t < 00 if ex = - a1/ao is positive ? If so, how can it be expressed in terms of white noise?

5 Let c be a real constant. (a) Define precisely what should be me:ant by a solution to the

stochastic differential equation

.rloX'(t) + a1X(t) = c + W'(t) .

(b) Show that the ge:neral solution to this equation on 0 < t < 00 IS

X(t) = X(O)eat + � (eIXt - 1) + � t eIX(t - s) dW(s), aoex ao J o

where ex = - a1/aO ' (c) Suppose ex < O. Find the stationary solution to the equation on

- 00 < t < 00 , and show that it is th(� unique such solution.

6 In each of the following stochastic differential equations find Var X(t), t > 0, for the solution on 0 < t < 00 having initial conditions X(O) = 0 and X'(O) = 0 ; if the left side of the equation is stablt:�, find the covariance function of the stationary solution on - 00 < t < 00 . (a) X"{t) + X'(t) - W'(t) ; (b) X"{t) + 3X'(t) + 2X{t) = W'(t) ; (c) 4X"(t) + 8X'(t) + 5X(t) = W'(t) ; (d) X"{t) + 2X'(t) + X{t) = W'{t) ; (e) X"{t) + X{t) = W'{t).

7 Show that the left side of the stochastic differential equation

is stable if and only if the coefficients ao, a1 , and a2 are either all positive or all negative.

s Suppose that the left side of the stochastic differential equation

is stable and let Xo(t ), - 00 < t < 00 , be its stationary solution. (a) Show that in Cases 1 and 3 the correlation between Xo(s) and

Xo(t) is positive for all s and t. (b) Show that in Case 2 there exist choices of s and t such that Xo(s)

and Xo{t) are negatively correlated.

9 Let Xo(t), - 00 < t < 00 , be the stationary solution to

where the left side of this stochastic differential equation is stabl��.

186 Stochastic Differential Equations

(a) Show that X�(t) has the covariance function 2

rX6(t) = -

q 4>�(l tD· 2a1a2

(b) Find l/Jr(O) and use this to compute Var Xo(t). Hint : Use the definition of l/Jl (t) rather than its explicit formula.

1 0 Let X(t), - 00 < t .< 00 , satisfy the stochastic differential equation

aoX'(t) + alX(t) = W'(t).

Let Y(t), - 00 < t < 00, satisfy the stochastic differential equation

bo Y'(t ) + b1 Y(t) = X(t).

Show that the Y(t) process satisfies the stochastic differential equation

aobo Y"(t) + (aOb1 + a1bo) Y'(t) + a1b1 Y(t) = W'(t).

1 1 Let Y(t), - 00 < t .< 00, be a second order stationary process having continuous sample functions, mean /ly = 1 , and covariance function ry(t ) = e- 1t l , - 00 < t < 00 . (a) Find the mean and covariance function of the stationary solution

Xo(t), - 00 < t < 00 , to the stochastic differential equation

X'(t) + X(t) = Y(t).

(b) Find the mean and covariance functions of the solution X(t), o < t < 00, to this stochastic differential equation satisfying the initial condition X(O) = o.

(c) Show directly that

lim (rx(s, t) - '-xo(s, t)) = o. s,t-+ + 00

1 2 Let vii be as in Example 5 and suppose that f E Jt satisfies (55) and (56). Show that f is the optimal linear estimator of Y.

1 3: Let X(t), - 00 < t .< 00 , be a second order stationary process having mean zero and covariance function r(t), -- 00 < t < 00 . (a) Find the optimal predictor of X(I) of the form g (l) = bX(O), and

determine the mean square error of prediction. (b) Find the optimal predictor of X(I) of the form g (l) = b1X(0) +

b2X' (0), and dc�termine the mean square error of prediction. Assume here tha.t the X(t) process is differentiable.

(c) Find the optimal estimator of SA X(t) dt of the form b1X(0) + b2X(I), and determine the mean square error of estimation. Assume here that I r( I) 1 < r(O).

1 4. Show that for t 1 < t 2 the optimal (linear or nonlinear) predictor of W(t2) in terms of W�(t), t < t l ' is W(t2) == W(tl).

1 5 Let X(t), - 00 < t < 00, be a second order stationary process having mean zero and continuous covariance function. Show that the

Exercises 187

optimal linear predictor of X(t + s) in terms of X(O) and Xes) is the same as the optimal linear predictor of X(t + s) in terms of Xes) for all s > 0 and t > 0 if and only if

- 00 < t < 00 ,

for some nonnegative constant fX. Hint : lJse the fact that a bounded continuous real-valu��d functionf(t), 0 < t < 00 , satisfies the equation

f(s + t) = f(s)f(t), s > 0, t > 0

if and only if

o < t < 00 ,

for some nonnegative constant fX.

1 6 Let X(t), 0 < t < 00 , be the solution to the stochastic differential equation

QoX'(t) + Q1X(t) = W'(t)

satisfying the initial condition X(O) = O. Find the optimal (lin�ear or nonlinear) predictor of X(t1 + -r) by X(t), 0 < t < t 1 , where t1 and -r are positive constants. Determine the mean square error of prediction.

1 7' For each of the stochastic differential equations in Exercise 6 let X(t), o < t < 00, be the solution satisfying the initial conditions X(O) = 0 and X'(O) = 0 (or any other deterministic initial conditions). Find explicitly the optimal (linear or nonlinear) predictor of X(t1 + -r) in terms of X(t), 0 < t < t 1 , where t1 and -r are positive constants.

1 S1 Verify Formula (67) ..

1 9 Let Yet), - 00 < t < 00 , be a second order stationary process with spectral density fy(l), - 00 < )" < 00 . Set ft+t X(t) == Yes) ds, t- t - 00 < t < 00 .

(a) Find a function h(t), - 00 < t < 00 , such that

X(t) = f�<X) h(t - s)Y(s) ds, - 00 < t < 00 .

(b) Show that X(t), - 00 < t < 00 , is a second order stationary process and find its spectral density function.

20 Let X(t), - 00 < t < 00, be a second order stationary process having spectral density fx and set yet) = X(t + 1) - X(t), - 00 < t < 00 . Use the formula for the covariance function of the yet) process given in Exercise 3 of Chapter 4 to show that Y has the spectral density

fy(l) = 2(1 - cos l)fx(A), - 00 < A. < oo.

188 Stoc�"astic Differential Equations

211 Let h(t), - 00 < t < 00 , be a continuously differentiable function such that

J:oo I h(t) 1 dt < 00 and

Then the process

X(t) = J�oo h(t - s) dW(s), - 00 < t < 00 ,

is a second order stationary Gaussian process whose covariance function is given by (7 1) . Show that it has a spectral density function given by (72).

2�� Use the result of the: previous exercise to c;ompute the spectral density of the process X(t) = W(t + 1 ) - W(t}, - 00 < t < 00 .

2�� Find the spectral density of the process Xo(t), - 00 < t < 00 , defined in Exercise 1 1 .

2�� Let X{t), - 00 < t < 00 , be the stationary solution to

where the left side of this stochastic differential equation is stable. Show that the X{t) process has the spectra.l density given by

251 A transmitter transnlits a constant but unknown signal s. The output of a receiver is the stationary solution X"(t), - 00 < t < 00 , to the stable stochastic diftierential equation

Suppose that s is estimated from X{t), 0 �; t < T, by

a iT s = --E X(t) tit.

T 0

Show that s is an unbiased estimator of s (i .e. , Es = s), and that

lim T Var s = (12 . T-+ oo

Hint : Use Exercise 7 of Chapter 5 and observe that

Exercises 189

26 Let X(t), - 00 < t < 00 , be a second order stationary process having a covariance function of the form

- 00 < t < 00 ,

where 0 < A 1 < A2 (see Example 5 of Chapter 4). Find the spectral distribution function of the X(t) process.

Answers

C H A PTE R 1

1 (a) (1 - p)2/ [(1 - p)2 + pq ], (b) 1to(O)(1 - P - q)(p - q) + q(p + 1 - q).

(X/d)2,

2 P(x, y) = 2x(d - x)/d2, (d - x)2/d2, 0,

y = x - I , Y = x, Y = x + 1 , elsewhere.

3 P(x, y) = {fey), pf(y - x + 1 ) + (1 - p)f(y - x),

5 (a) Po(To = 1 ) = 1 - P (b) p(1 - p)n- l .

6 P(Xl = x) = P(Xo = x).

and

x = 0, x > 1 .

{t, 1 0 (a) Px(To = 1 ) = 0,

x = 1 , elsewhere ; Px(To = 2) = �, r

{� 2 7 ,

Px(To = 3) = �, 0,

x = 1 , x = 3, elsewhere.

0 1 0 0 t 0 t 0

(b) P = t 0 t 0 0 t 0 t

, 0 7 0 2 p2 = 9" 9" 2 0 7 0 , 9" 9"

0 0 1 0 0 t 0 t

0 7 0 2 9" 9"

-L 0 2 0 0 and p3 = 2 7 TI 0 2 0 0 7

TI 27 2 0 7 0 9" "9

( ) 1t - ( 1 S S 1 ) _ ( S 1 3 1 3 S ) C 1 - IT, T.i, TI, TI , 1t2 - 36", "36, "36, 36 ,

d 1t - ( 1 3 4 1 4 1 1 3 ) an 3 - 1 0 8 , 1 0 8 , 1 0 8 , 1 0 8 •

790

0,

n > 2.

x = 0, x = 2, elsewhere ;

Answers

0 0 1 1 1

1 1 (a) P := s-

2 0 3 0

d 1 4 Ex(Xn) = - +

2

1 0 J. 5 1 s-0

1 8 (b) pn-. 1 (1 - pl.

2 3 0 0 0 0 1 1 0 1 -1L s and p2 = 2 5 J. 1 2 1 5 s- IT 0 1 3 0

and

1 9 (a) 0 is transient and all other states are recurrent.

1. 2 0 0 2: � �r 2 5 fit 2 2.5 S 0 0

3 0 _1

2 5

285 1

n odd, n even.

(b) POOl = t, PO I = P02 = P03 = :1, and P04 = Pos = P06 = t.

20 (a) 3 and 5 are transient and the other states are recurrent.

191

(b) P{O ll l }(O) = p{o, I }(I ) = 1 , p{0 , 1 }(2) = p{0 , 1 }(4) = 0, p{0 , 1 }(3) = ft, and p{0 ,. 1 }(5) =

-1\.

23 p{O}(x) = 1 - (x/2d) , 0 < x < 2d.

24 Px(To -< Td) = [(q/p)X - (q/p)d ]/ [1 - (q/p)d ], 0 < X < d.

25 (a) . 1 , (b) $99. 10. 6 x- I 1

29 (b) 1 -- - � n2 )'= 0 ( y + 1 )2

30 (a) Px(Ta < Tb) = (a + 1)(b .- x)/(x + 1 )(b - a),

(b) PxO = 1/(x + 1), x > o. a < x < b ;

32 4(V5 -- 2).

33 ( VS - 1)/2.

38 (a) 0 is: transient and all other states are absorbing (and hence recurrent) . (b) 0 and 1 are recurrent and all other states are transient. (c) 0 is absorbing and all other states are transient. (d) All states are transient.

CHAPTER 2

1 n(O) = . 3, n(l) = .4, and n(2) = .3 . 6 When p < t, the stationary distribution exists and is given by

n(O) = 1 - 2p

2(1 - p) and n{x - - JJ p ( 1 2 ) x-- l

, ) -2 (1 - p )X+ 1 ' x > 1 .

792 Answers

7 (a) n(x) = (:) ;d ' x = 0, 1 , . . . , d, which is thc� binomial distribution with

parameters d and t. (b) mean dl2 and variance d14.

xl2d,

8 P(x, y) = �� _ x)/2d, 0,

9 n(x) = (:r/e:) , 1 3 (llq)pn.

y == x - I , y == x, y == x + 1 , els(�where.

x = 0, 1 , . . . , d.

1 4 n(x) = (1 - p)pX, x > O.

1 5 n(x) = l id, x E f/. 1 7 2d.

1 1 8 (b) n(x) = - , 1 < x < c,

2c and

1 n(x) = - , C + 1 < x < C + d.

2d

1 9 (a) no == (0, t, t, t, 0, 0, 0) and n 1 = (0, 0, 0, 0, t, t, t).

(b) (lim GnCx, y») = n-+ ClO n

0 1 2 3 4 5 6

0 0 0 0 0 0 0 0

1 2 3 3

TI' IT t t t t t t 0 0 0 0 0 0

3 4 5 6 3 _1 _1 1

IT 1 2 1 2 12 t 0 0 0 t 0 0 0 -l 0 0 0 0 t t t 0 t t t 0 t t -1

20 (a) no == (t, t, 0, 0, 0, 0) and n 1 = (0, 0, 163 , 0, "1

73 ' 0).

(b) ( lim Gn(x, y») = n-+ co n

0 1 2 3 4 5

0 2 "5 2 "5 0

1 4 55 0

1 2 55

][ 3� �i 3� �; ()

2 1 55 ()

1 is 5S

2 3 4 5 0 0 0 0 0 0 0 0

� 0 -L 0 1 3 1 3 -.H.. 0 -.il.. 0 1 4 3 1 4 3 � 0 7 0 1 3 IT 2..2- 0 -.J..L 0 1 4 3 1 4 3

21 (a) PO()(n = 0) ..:. t, PO(Xn = 2) ..:. 1, PO(Xn =: 4) ..:. t, and PO(){n = 1 ) = PO(Xn = 3) == O.

(b) PO(}{n = 1 ) ..:. t, Po(Xn = 3) ..:. t, and PoC)(n = 0) = PO(Xn = 2) == PO(Xn = 4) = O.

22 (b) 1 . (c) n = (t, t, t) . 23 (b) 3.

Answers 193

C HAPTER 3

2 Poo(/) = J.ll + _� e-).ot + AoJ.ll e- ().O +).l +1l1 )t AO + Al + J.ll A1 + J.ll (Ao + Al + J.ll)(A1 + J.ll)

Ao Po 1(/) = -----Ao + Al + iiI

__ A_O __ e- ().O+).l +Jll )t Ao + Al + iiI

1 > 0, and

5 Tm has the gamma density with parameters m and A ; i.e. ,

7 P(X(T) = n) = AnV/(A + v)n+ 1 .

8 P(X(T') = n) = AnVar(n + �� J

(A + v)n+an ! ]�(cx)

1 > 0, and

1 0 (a) P�y(/) = - J.lyPxy(/) + J.ly+ IPx,y+ 1 (I), Y < x -- 1 ; P�x(/) = - J.lxPxx(/) .

(b) Pxx(/) = e- Pxt.

(c) Pxy(t) = Py+ 1 f: e-I',,(t-S)Px.Y + 1(S) ds, y < x.

(d) PX,Jc- 1(/) = J.lx (e-Ilxt - e-llx - 1t), J.lx- l =fi J.lx, J.lx- 1 - J.lx

and P ( ) - IJ' t x ,x- l 1 = J.lx1e ,x , J.lx- 1 = J.lx•

1 < o.

1 < o.

1 1 EX(t) = ; (1 - e-I't) + xe-I't, Var X(t) = (; + xe-I't) (1 - e-I't).

1 2 (a) P�yl(t) = ( y - l)APxS- l(/) - Y(A + J.l)Pxy(/) + ( y + 1 )J.lPx,y+ 1(/).

1 3 (b) Sx(/) = xe2().-p)t [x + A -+---.!! (1 - e- ().-Il)t)] , A -- J.l

and Sx(/) = x(x + 2A/), A = J.l.

(c) Val' X (I) = x � + J.l (e2().-'p)t - e().-Il)t), A - J.l

and. Var X(/) ' 2XAI, A = J.l.

194

1 4 (a) P,x := xp, and lx = (d - x)l.

Ad-x (b) Pxit) = (1 + p)d (

1 + pe-<Hp),)X(l - e-<Hp)t)d-X,

(c) xe- (A+Il)t + dA (1 _ e-- (A+Il)t). l + p,

1 6 (a) nun recurrent. (b) transient.

20 n(x) = (d) ( l )X ( P, )d-X

. x A + p, A + p,

21 n(x) = (A/p,)Xjf (llpY, x ! )'= 0 y !

Answers

22 (b) The� average rate at which c:ustomers are served equals the arrival rate l.

C HAPTER 4

5 p,y(t) =: 0 and ry(s, t) == l(min (s, t) - st).

6 P,x(t) =: t and rx(s, t ) == ! (min (s, t) - st). n 7 'Xy(s, t) = rx(t - s + 1).

1 2 p,y(t) = !(t)P,x(g(t» and ry(s, t) = !(s)!(t)rx(�1(s), g(t» .

1 3 (a) p,y(t) = rx(O) and ry(s, t ) = 2(rx(s, t» 2.

1 4 (a) exp -2

2 ,..,2 _ P 1 ,.., 1 •

1 [ 1 (X - II X _ II ) 2] V2�rra�(1 - p2) 2(1 - p ) a2 a1

(b) P,2 .+ p a2 (Xl - P,1). at 1 7 Normal with mean 0 and variance a2n(n + 1 )(2n + 1)/6. 20 (a) P,x(t) = a2t and rx(s, t) = 2a4 min (S2, t 2).

(b) and (c) P,x(t) = 0 and rx(s, t) = a2 min (s, t). (d) Jlx(t) = 0 and 'x(s, t) = a2(min (s, t) - st).

C HAPTER 5

2 V3t (2 - t)/2 .

4 P,x(t) = 0 ; s < t ;

,S > t.

and

Answers

1 2 (a) ri2)(t - s), 1 3 J,lr(t) = = J,lx and

1 4 c( W(b) - W(a» .

I t I < 1 ,

795

and rx(t) = 0, I t I > 1 .

1 5 EX = 0, Var X = a2/3 , EY = 0, and p = v'ls14.

a2 1 6 (a) - min (S3, t 3),

3 (b) a2 (Sin (s + t) + sin (s - t»)

, 2 s + t s - t

a2 (c) - ( I t - s l - 1)2(l t - sl -1r 2),

6

2 1 7 (b) �- (e2at - 4eat + 2a.t + 3).

2a.3�

Is - t l < 1 , and

CHAPTER 6

1 it 2 (a) V(t) = voe(Zt + - ea(t-.S) dW(s), m 0

where a. = -11m.

(c) X(t) = xo + � (eat - 1 ) + - (ea(t-U) - 1) dW(u). v 1 it a. ma. 0

(d) X(t) = Xo + f: V(u) duo

(e) J,lx( .t) = xo + Vo (eat - 1), a.

3 (a) VoCt) = ! ft erAt-S) dW(s). m -

00

(b) J,lyo(t) = 0,

o elsewhere.

(e) Xo(t) = eat - 1 fO e-au dW(u) + _1_ (t (e(Z(t- ill) - 1) dW(u). ma - 00 ma Jo

a2m (f) J.lxo(t) = 0, Var Xo(t) = p (e"t - 1 - at).

4 Yes. �r(t) = - � (00 ea(t-s) dW(s) . ao J t

196 An;swers

5 (a) A solution to the indicated equation on an interval containing the point to is defined as a process having continuous sample functions which satisfies the equation

ao(.�r(t) - X(to» + al f t .X(s) ds = c(t - to) + W(t) - W(to)

J to on that interval.

(c) Xo(t) = - � + � It e(1(t-s) dW(s).

aocx ao - 00

6 (a) Var X(t) = u2 (2e- t - e�':.. - � + t) . 2 2

(b) Var X(t) = ;2

(1 - 6e- 2t + 8e- 3t - 3e-4t), rxo(t) = ;2

(2e- 1 t l - e-2 I t l ).

2 (c) Var X(t) = � [1 + e-2t(4 cos t - 2 sin t - 5) ],

80

r (t) = a2

e- 1 t l (cos ! + 2 sin �) . Xo 80 2 2 a2

(d) Var X(t) = - [1 - e- 2t(2t2 + 2t + 1 ) ], 4 2 (e) Var X(t) = � (2t - sin 2t).

9 (b) 4>�(0) = - a2/aO, Var X(,(t) = a2/2aoa1 . 1 1 (a) J.lxo(t) = 1 , rxo(t) = e- 11 t l ( / t / + 1)/2.

- I t- s l (b) Jlx(t) = 1 - e- t, rx(s, t ) =

e 2

[ I t - s l + 1 - (s + t + l )e- 2m1n( • • t)] .

1 3 (a) r(l ) X(O) , r 2(0) - r2(1) r(O) r(O)

(b) r(l ) .X(O) _

r '(I ) X'(O), r2(0) - r2(1 ) _

3(r '(1 » 2 r(O) r"(O) r(O) r " (O)

(c) [f r(t) dt/(r(O) + r(1))] (X(O) + X(l)),

1 1 [ 1 ( ) d d 2(JA r(t) dt)2 r s - t s t - .

o � 0 r(O) + r(l )

1 6 X(t1 + 1�) = e(1tX(t1), where a = - a1/aO ;

1 7 (a) X(t 1 ) + (1 - e-t)X'(t 1 ). (b) (2e- l� - e-2t)X(t1 ) + (e-t -- e- 2t)X'(t1 ).

(c) e- t lcos � + 2 sin �) X(t1) + 2e- t sin � X'(t1). " 2 2 2

(d) e-t(l + �)X(tl) + �e-tX'(tl). (e) cos � X(t 1 ) + sin � X'(t 1) .

Answers

1 9 (a) h(t) = 1 , - t < t < J,

(b) /xO.) = ei�/i2) 2 /y()').

22 a 2 (1 - cos A) .

1t A2

and h(t) = 0 elsewhere.

0, - a) < A < - A2, ai/2, 26 F(A) == (ar + ai)/2, ai + (ai/2), ai + ai,

- A2 < A < - Ab - A t < A < Ab

A 1 < A < A2, A2 < A.

197

XII

P(x, y)

l ,(x)

G(x, y)

NII(y) GII(X, y)

m,

Glossary of Notation

CHAPTER 1

state space

state of system at time n

initial distribution

probability of going from x to y in one step

probability of going from x to y in n steps

probabil ity of an event defined in terms of a chain starting at x

hitting time of the set A

hitting time of the state y

probability that a chain starting at x will ever visit y

function that is one if x = y and zero if x #= y

total number of visits to y

expectation of a random variable defined in terms of a chain starting at x

expected number of visits to y for a chain starting at x

set of transient states

set of recurrent states

probability that a chain starting at x will eventually be absorbed into the closed set C

C HAPTE R 2

stationary distribution

number of visits to y by time n

expected number of visits to y by time n for a chain starting at x

mean return tiine to a recurrent state y

199

time of rth visit to y

waiting time between the (r - 1 )th visit to y and the r th visit to y

set of positive recurrent states

period of the state x

period of an irreducible chain

CHAPTER 3

state space

time of nth jump

state of system at time t

probability of an event defined in terms of a process starting at x

expectation of a random variable defined in terms of a process starting at x

distribution function of time to first jump for a process starting at a non-absorbing state x

one if x = y and zero if x #= y

probability that a process starting at a non-absorbing state x will go to y at its first transition (Q,x, = o,x, if x is an absorbing state)

probability that a process starting at x will be in state y at time t

initial distribution of the process

exponential parameter in dis .. tribution of time to first jump for a process starting at a non .. absorbing state x (q,x = 0 if x is an absorbing state)

infinitesimal parameters of the process defined by q,x, = P �,(O)

200

T,

T X(/) Jlx(/) rx(s, I)

Jlx

first time 1 � 'l' l that the process is in state y oJC, if x is an absorbing state ; otherwise the probabili1ty that the process visits y at SODtle time 1 � 'l'l

stationary distribution of the process

mean return time to a nonabsorbing recurrent state:: x

CHAPTER 4

time parameter set

value of process at time 1

mean of X(/) covariance between X(s) and X(/)

mean when independent of 1

X'(/) x(n )(/)

y Ix Fx

Glossary of Notation

covariance between X(s) and X(s + I ) for a second order stationary process

covariance between X(s) and Y(/)

CHAPTER 5

derivative of the X(/) process

nth derivative of the X(/) process

C HAPTER 6

collection of allowabh� estimators

estimator of Y spectral density function

spectral distribution function

Absorbing state, chain, 8

process, 85

Absorption probabilities, 25

Allowable estimators, 1 7 1

Aperiodic chain, 73

Aperiodicity of a process, 1 04

Auto-covariance function, 1 1 1 second order stationary process, 1 1 3

Backward equation, 89 birth and death process, 92

Birth and death chain, 9 period, 73 positive recurrence, 66 recurrence, 32 stationary distribution, 50

Birth and death process, 89 stationary distribution, 104 transience, null recurrence, and positive

recurrence, 104

Birth rates, 89

Bochner's theorem, 1 82

Bounded convergence theorem, 63

Branching chain, 1 0 extinction probability, 1 1 , 34

Branching process, 91 immigration, 97

Brownian motion, 1 23 non differentiability, 141

Chapman-Kolmogorov equation, 87

Characteristic polynomial, 1 60

Closed set of states, 22

irreducible, 23

Continuous parameter process, 1 1 1

201

Index

Covariance function, 1 1 1 auto-, 1 1 1 continuous, 1 28 cross-, 1 1 8

nonnegative definite property, 1 1 2

second order stationary process, 1 1 3

symmetry, 1 1 2

Cross-covariance function, 1 1 8 continuity, 1 29

Death rates, 89

Differentiable second order process, 1 35

Discrete parameter process, 1 1 1

Distribution of time to first jump, 8 5 exponential, 86

Divisor, 72

I)oubly stochastic transition function, 82

Ehrenfest chain, 7

modified, 52

E,instein, Albert, 1 23

Embedded chain, 1 02

Empirical distribution function, 1 25

Estimation, 1 70 general principle, 1 73 linear, 1 72

mean square error, 1 7 1

nonlinear, 1 72

optimal, 1 7 1

Estimation of mean, 149

Estimation of signal, 1 88

Expected number of visits, 1 9 by time n , 5 7

Explosions, 8 5

Exponential parameter, 87

202

Filtering, 1 7 1 Finite linear combination, 1 1 2 Forward equation, 89

birth and death process, 92 Frequency response function, 1 80

Gambler's ruin chain, 8 Gaussian distribution, 1 20

joint, 1 2 1 Gaussian process, 1 1 9

differentiation, 1 39 integration, 1 34 sample function continuity, 1 3 1 strictly stationary, 1 22

Greatest cornmon divisor, 72

Hitting time:, chain, 1 4 process" 1 02

Impulse response function, 1 60 Indicator function, 1 8 Infinite servc�r queue, 99

positive recurrence, 106 Infinitesima1l parameters, 89 Initial distribution, chain, 5, 6

process" 86 Irreducible chain, 23 Irreducible closed set, 23 Irreducible process, 1 02

Jump process, 84 explosive, 86 non-explosive, 85 pure, 85

Jump times, 84

Langevin's velocity process, 1 84 Levy, Paul, 1 23 Linear birth process, 98 Linear estimation, 1 72

Markov chain, 1 aperiodiic, 73 irreducible, 23 null recurrent, 62 periodic:, 73 positive recurrent, 62 recurrent, 21 transient, 21

Markov property, chain, 1 process, 86

Markov pure jump process, 86 aperiodlicity, 104 irreducible, 1 02 null recurrent, 103 positiv(� recurrent, 1 03 recurrent, 102 transient, 102

Martingale, 27 Mean functllon, 1 1 1

continuity, 1 28 Mean retunl time, chain, 58

process, 103 Mean square continuity, 1 28 Mean square convergence, 1 38, 1 72 Mean square error, 1 7 1

N server queue, 106 Nonlinear estimation, 1 72 Nonnegativc� definite property, 1 1 2 Normal distribution, joint, 1 21 Normal process, 1 20 N ull recurre�nt chain, 62 N ul l recurre�nt process, 103 Null recurre�nt state, chain, 60

process:, 103 Number of visits, 1 8

by time: n, 57

One-step tra.nsition probabilities, 5 Optimal estimator, 1 7 1 Ornstein-Uhlenbeck process, 1 84 Orthogonal random variables, 1 73

Period of a �chain, 73 Period of a :state, 72 Periodic cha.in, 73 Poisson proc:ess, 96

Index

conditional distribution of arrivals, 99 mean and covariance functions, 1 1 5

Positive recurrent chain, 62 Positive recurrent process, 103 Positive recurrent state, chain, 61

process" 103 Prediction, 1 70 Probability generating function, 34 Pure birth process, 90

l inear, 98 Pure death process, 90

Inde.

Queuing chain, 9 positive recurrence, 69 recurrence, 36

Queuing process, infinite server, 99 N senrer, 106

Random �ralk, 7 simple:, 7

Recurrent Ichain, 21 Recurrent process, 102 Recurrent :state, chain, 1 7

proces�, 102

�ample function continuity, 1 30 Schwarz's linequality, 1 1 3 Second order process, 1 1 1

derivative, 1 35 higher derivatives, 1 37 stationary, 1 1 2

Second order stationary process, 1 1 2: Spectral de:nsity function, 1 77, 1 83 Spectral di:stribution function, 1 78, 1 82 Stability, 1 60 State space�, chain, 1

proces:s, 84 Stationary distribution, chain, 47

concentrated on a closed set, 67 process, 102

Stationary transition probabilities, 1 Steady state distribution, 47 Stochastic differential equation, 1 52

nth order, 1 53, 1 65, 1 66

Stochastic process, 1 1 1 Strictly sta.tionary process, 1 22 Strong la Yl of large numbers, 58

Time paralmeter set, 1 1 1 Transient (;hain, 21 Transient process, 1 02 Transient state, chain, 1 7

proces.s, 102 Transition function, chain, 5, 6

m-step, 1 3 process, 86

Transition matrix, 1 6 n-step�, 1 6

Transition probabilities, chain, 1 one-st�ep, 5 proces.s, 85 stationary, 1

Two-state birth and death process, 912 mean ;and covariance function, :1 14

Two-state :Markov chain, 2, 1 7, 49

White noise, 142 spectral density, 1 82

Wiener, Norbert, 1 23 Wiener process, 123

continuity of paths, 1 32 covariance function, 1 24 non-dilfferentiability, 140 white noise, 142

Wiener-Uvy process, 1 23

203

Iintroduction

to Stochastic

Processes

paul gerhard hoel - introduction to stochastic processes (the houghton mifflin series in statistics)...

Documents

ofthe roofs

bilit statistics

hoel sidneyc

rocesses paulf

resent state ofthe rocess

terms ofthe future

oco o ioocoy ieoy

oco o sesce deoy