stochastic modelling notes discrete

8/9/2019 Stochastic modelling Notes Discrete

1/130

MAST30001 Stochastic Modelling

Lecturer: Nathan Ross


2/130

Administration

LMS - announcements, grades, course documents

Lectures/Practicals Student-staff liaison committee (SSLC) representative

Slide 1


3/130

Modelling

We develop an imitation of the system. It could be, for example, a small replica of a marina development,

a set of equations describing the relations between stockprices,

a computer simulation that reproduces a complex system(think: the paths of planets in the solar system).

We use a model

to understand the evolution of a system,

to understand how outputs relate to inputs, and to decide how to influence a system.

Slide 2


4/130

Why do we model?

We want to understand how a complex system works. Real-worldexperimentation can be

too slow,

too expensive,

possibly too dangerous,

may not deliver insight.

The alternative is to build a physical, mathematical orcomputational model that captures the essence of the system that

we are interested in (think: NASA).

Slide 3


5/130

Why a stochastic model?

We want to model such things as

traffic in the Internet

stock prices and their derivatives

waiting times in healthcare queues reliability of multicomponent systems

interacting populations

epidemics

where the effects of randomness cannot be ignored.

Slide 4


6/130

Good mathematical models

capture the non-trivial behaviour of a system,

are as simple as possible,

replicate empirical observations, are tractable - they can be analysed to derive the quantities of

interest, and

can be used to help make decisions.

Slide 5


7/130

Stochastic modelling

Stochastic modelling is about the study of random experiments.

For example, toss a coin once, toss a coin twice, toss a coin infinitely-many

times

the lifetime of a randomly selected battery (quality control)

the operation of a queue over the time interval [0, ) (service) the changes in the US dollar - Australian dollar exchange rate

from 2006 onwards (finance)

the positions of all iphones that make connections to a

particular telecommunications company over the course of onehour (wireless tower placement)

the network friend structure of Facebook (ad revenue)

Slide 6


8/130

Stochastic modelling

We study a random experiment in the context of a ProbabilitySpace (, F, P). Here, thesample space is the set of all possible outcomes of our

random experiment, theclass of eventsF is a set of subsets of . We view these

as events we can seeormeasure, and

P is aprobability measuredefined on the elements ofF.

Slide 7


9/130

The sample space

We need to think about the sets of possible outcomes for therandom experiments. For those discussed above, these could be

{H, T},{(H, H), (H, T), (T, H), (T, T)}, the set of allinfinite sequences ofHs and Ts.

[0,

).

the set of piecewise-constant functions from [0, ) toZ+. the set of continuous functions from [0, ) to IR+.

n=0{(x1, y1) . . . (xn, yn)}, giving locations of the phones

when they connected.

Set of simple networks with number of vertices equal to thenumber of users: edges connect friends.

Slide 8


10/130

Review of basic notions of set theory

A B. A is asubsetofBor ifA occurs, then Boccurs.

A B= { : A or B} =B A. Unionof sets (events): at least one occurs.

A1 A2 An = ni=1Ai. A B= { : A and B} =B A= AB.

Intersectionof sets (events): both occur. A1 A2 An =

ni=1Ai.

Ac =

{

:w

A

} Complementof a set/event: event doesnt occur.: theempty set or impossible event.

Slide 9


11/130

The class of eventsF

For discrete sample spaces,F is typically the set of all subsets.

Example: Toss a coin once,F= {, {H}, {T}, {H, T}} For continuous state spaces, the situation is more complicated:

Slide 10


12/130


Sequals circle ofradius1.

We say two points on Sare in the samefamilyif you can getfrom one to the other by taking steps of arclength 1 around

the circle. Eachfamilychooses a single member to behead.

IfX is a point chosen uniformly at random from the circle,what is the chance X is the head of its family?

Slide 11


13/130


A= {X is head of its family}. Ai= {X is isteps clockwise from its family}. Bi= {X is i steps counterclockwise from its family}. By uniformity, P(A) =P(Ai) =P(Bi),BUT

law of total probability:

1 =P(A) +i=1

(P(Ai) +P(Bi))!

The issue is that the event A is not one we can seeormeasuresoshould not be included inF.

Slide 12


14/130


These kinds of issues are technical to resolve and are dealt with in

later probability or analysis subjects which use measure theory.

Slide 13


15/130

The probability measure P

The probability measure Pon (, F) is a set function fromFsatisfying

P1. P(A) 0 for all A F [probabilities measure long run %s orcertainty]

P2. P() = 1 [There is a 100% chance something happens]

P3. Countable additivity: ifA1, A2 are disjoint events inF,then P(

i=1Ai) =

i=1P(Ai) [Think about it in terms of

frequencies]

Slide 14


16/130

How do we specify P?

The modelling process consists of defining the values ofP(A) for some basic events in A F, deriving P(B) for the other unknown more complicated

events in B Ffrom the axioms above.

Example: Toss a fair coin 1000 times. Any length 1000 sequenceof Hs and Ts has chance 21000.

What is the chance there are more than 600 Hs in thesequence?

What is the chance the first time the proportion of headsexceeds the proportion of tails occurs after toss 20?

Slide 15


17/130

Properties ofP

P() = 0.

P(Ac

) = 1 P(A). P(A B) =P(A) +P(B) P(A B).

Slide 16

C


18/130

Conditional probability

Let A, B Fbe events with P(B)> 0. Supposing we know thatB

occurred, how likely isA

given that information? That is, whatis theconditional probability P(A|B)?For a frequency interpretation, consider the situation where wehave n trials and Bhas occurred nB times. What is the relativefrequency ofA in these nB trials? The answer is

nAB

nB=

nAB/n

nB/n P(A B)

P(B) .

Hence, we define

P(A|B) = P(A

B)

P(B) .

We need a more sophisticated definition if we want to define theconditional probability P(A|B) when P(B) = 0.

Slide 17

E l


19/130

Example:

Tickets are drawn consecutively and without replacementfrom abox of tickets numbered 1 10. What is the chance the secondticket is even numbered given the first is

even?

labelled 3?

Slide 18

B f l


20/130

Bayes formula

Let B1, B2, , Bn be mutually exclusive events with A nj=1Bj,

then

P(A) =n

j=1P(A|Bj)P(Bj).

With the same assumptions as for the Law of Total Probability,

P(Bj|A) = P(Bj A)P(A)

= P(A|Bj)P(Bj)

nk=1P(A|Bk)P(Bk)

.

Slide 19

E l


21/130

Example:

A disease affects 1/1000 newborns and shortly after birth a baby isscreened for this disease using a cheap test that has a 2% false

positive rate (the test has no false negatives). If the baby testspositive, what is the chance it has the disease?

Slide 20

I d d t t


22/130

Independent events

Events A and Bare said to beindependentifP(A B) =P(A)P(B).IfP(B) = 0 or P(A) = 0 then this is the same as P(A|B) =P(A)and P(B

|A) =P(B).

Events A1, , An are independent if, for any subset{i1, ..., ik} of{1, ..., n},

P(Ai1 Aik) =P(Ai1 ) P(Aik).

Slide 21

Ra do a iables


23/130

Random variables

Arandom variable(rv) on a probability space (, F, P) is afunction X : IR.Usually, we want to talk about the probabilities that the values ofrandom variables lie in sets of the form (a, b) = {x :a


24/130

Distribution Functions

The function FX(t) =P(X t) =P({:X() (, t]} thatmaps R to [0, 1] is called thedistribution functionof the randomvariableX.Any distribution function F

F1. is non-decreasing,

F2. is such that F(x) 0 as x and F(x) 1 as x ,and

F3. is right-continuous, that is limh0+F(t+ h) =F(t) for all t.

Slide 23



25/130


We say that the random variable X isdiscreteif it can take only

countably-many values, with P(X =xi) =pi >0 and

ipi= 1. Its distribution function FX(t) is commonly a stepfunction.

the random variable X iscontinuousifFX(t) isabsolutelycontinuous, that is if there exists a function fX(t) that mapsR to R+ such that FX(t) =

t

fX(u)du.

Amixedrandom variable has some points that have positive

probability and also some continuous parts.

Slide 24

Examples of distributions


26/130

Examples of distributions

Examples of discrete random variables: binomial, Poisson,geometric, negative binomial, discrete uniformhttp://en.wikipedia.org/wiki/Category:

Discrete_distributions

Examples of continuous random variables: normal,exponential, gamma, beta, uniform on an interval (a, b)http://en.wikipedia.org/wiki/Category:

Continuous_distributions

Slide 25

Random Vectors
http://en.wikipedia.org/wiki/Category:Discrete_distributionshttp://en.wikipedia.org/wiki/Category:Discrete_distributionshttp://en.wikipedia.org/wiki/Category:Continuous_distributionshttp://en.wikipedia.org/wiki/Category:Continuous_distributionshttp://en.wikipedia.org/wiki/Category:Continuous_distributionshttp://en.wikipedia.org/wiki/Category:Continuous_distributionshttp://en.wikipedia.org/wiki/Category:Discrete_distributionshttp://en.wikipedia.org/wiki/Category:Discrete_distributions


27/130

Random Vectors

Arandom vector X= (X1, ..., Xd) is a measurable mapping of(, F) to IRd, that is, for each Borel set B IRd,{:X() B} F.The distribution function of a random vector is

FX(t) =P(X1 t1, , Xd td), t= (t1, , td) Rd

.

It follows that

P(s1


28/130

Independent Random Variables

The random variables X1, , Xdare calledindependentifFX(t) =FX1 (t1) FXd(td) for all t= (t1, , td).Equivalently,

the events{X1 B1}, ,{Xd Bd} are independent for allBorel sets B1, , Bd R, or, in the absolutely-continuous case,

fX(t) =fX1 (t1) fXd(td) for all t= (t1, , td).

Slide 27

Revision Exercise


29/130

Revision Exercise

For bivariate random variables (X, Y) with density functions

f(x, y) = 2x+ 2y 4xy for 0< x


30/130

Expectation ofX

For a discrete, continuous or mixed random variable X that takeson values in the set SX, theexpectationofX is

E(X) =

SX

xdFX(x)

The integral on the right hand side is a Lebesgue-Stieltjes integral.

It can be evaluated as

=

i=1xiP(X =xi), ifX is discrete

xfX(x)dx, ifX is absolutely continuous.

In second year, we required that the integral be absolutelyconvergent. In this course we will allow the expectation to beinfinite, provided that we never get in a situation where we have .

Slide 29

Expectation of g(X )


31/130

Expectation ofg(X)

For a measurable function g that maps SX to some other set SY,Y =g(X) is a random variable taking values in SY and

E(Y) =E(g(X)) =

SX

g(x)dFX(x).

We can also evaluate E(Y) by calculating its distribution functionFY(y) and then using the expression

E(Y) = SY

ydFY(y).

Slide 30

Properties of Expectation


32/130

Properties of Expectation

E(aX+bY) =aE(X) +bE(Y).

IfX

Y, then E(X)

E(Y).

IfX c, then E(X) =c.

Slide 31

Moments


33/130

The kthmomentofX is E(Xk).

The kthcentral momentofX is E[(X E(X))k]. Thevariance V(X) ofX is the second central moment

E(X2)

(E(X))2.

V(cX) =c2V(X). IfX and Yhave finite means and are independent, then

E(XY) =E(X)E(Y).

IfX and Yare independent (or uncorrelated), then

V(X Y) =V(X) +V(Y).

Slide 32

Conditional Probability


34/130

y

Theconditional probability of event A given X is a randomvariable (since it is a function ofX). We write it as P(A|X). for a real number x, ifP(X =x)> 0, then

P(A

|x) =P(A

{X =x

})/P(

{X =x

}).

ifP(X =x) = 0, then

P(A|x) = lim0+

P(A{X (x, x+)})/P({X (x, x+)}).

Slide 33

Conditional Distribution


35/130

Theconditional distribution function FY|X(y|X) ofYevaluated at the real number y is given by P({Y y}|X),where P(

{Y

y}|

x) is defined on the previous slide.

If (X, Y) is absolutely continuous, then the conditional densityofY given that X =x is fY|X(y|x) =f(X,Y)(x, y)/fX(x).

Slide 34

Conditional Expectation


38/130

Let = {a, b, c, d}, P({a}) = 12 , P({b}) =P({c}) = 18 andP({d}) = 14 .Define random variables,

Y() = 1, =a orb,

0, =c ord,

X() =

2, =a orc,5, =bord.

Compute E(X), E(X|Y) and E(E(X|Y)).

Slide 37

Example


39/130

The number of storms, N, in the upcoming rainy season isdistributed according to a Poisson distribution with a parametervalue that is itself random. Specifically, is uniformlydistributed over (0, 5). The distribution ofN is called amixed

Poisson distribution.1. Find the probability there are at least two storms this season.

2. Calculate E(N|) and E(N2|).3. Derive the mean and variance ofN.

Slide 38

Exercise


40/130

The joint density ofX and Y is given by

fX,Y

(x, y) = ex/yey

y , x>0, y>0.

Calculate E[X|Y] and then calculate E[X].

Slide 39

Limit Theorems (Borovkov2.9)


41/130

TheLaw of Large Numbers(LLN) states that ifX1, X2, areindependent and identically-distributed with mean , then

Xn 1n

nj=1

Xj

as n .In the strong form, this is truealmost surely, which means that itis true on a set A of sequences x1, x2, . . . that has probability one.In the weak form, this is truein probabilitywhich means that, forall >0,

P(|Xn | > ) 0as n .

Slide 40

Limit Theorems (Borovkov2.9)


42/130

TheCentral Limit Theorem(CLT) states that ifX1, X2, areindependent and identically-distributed with mean and variance2, then for any x,

PX

n

/n


43/130

ThePoisson Limit Theoremstates that that ifX1, X2, areindependent Bernoulli random variables withP(Xi= 1) = 1 P(Xi= 0) =pi, then X1+X2+ +Xn iswell-approximated by a Poisson random variable with parameter=p1+ +pn.Specifically, with W =X1+X2+ +Xn, then, for any Borel setB R,

P(W B) P(Y B)where Y Po().There is, in fact, a bound on the accuracy of this approximation

|P(W B) P(Y B)|

ni=1p2i

max(1, ),

Slide 42

Example


44/130

Suppose there are three ethnic groups, A (20%), B (30%) and C(50%), living in a city with a large population. Suppose 0.5%, 1%and 1.5% of people in A, B and C respectively are over 200cm tall.

If we know that of 300 selected, 50, 50 and 200 people are from A,B and C, what is the probability that at least four will be over 200cm?

Slide 43

Stochastic Processes (Borovkov2.10)


45/130

A collection of random variables{Xt, t T} (or{X(t), t T})on a common prob space (, F, P) is called astochastic process.The index variable tis often called time.

IfT = {1, 2, } or{ , 2, 1, 0, 1, 2, }, the process isadiscrete time process.

IfT = IR or [0, ), the process is acontinuous time process IfT = IRd, then the process is aspatial process, for example

temperature at t T IR2, which could be, say, theUniversity campus.

Slide 44

Examples of Stochastic ProcessesIf X N(0 1) f ll t th X i ll d G i


46/130

IfXt N(0, 1) for all t, then Xt is called a Gaussian process.Different processes can be modelled by making differentassumptions about the dependence between the Xt for different t.

Standard Brownian Motion is a Gaussian process where For any 0 s1


47/130

Xt is the number of sales of an item up to time t.

{Xt, t 0} is called acounting process.

Slide 46

Examples of Stochastic Processes


48/130

Xt is the number of people in a queue at time t.

Lect 04 620-301 1

Slide 47

Interpretations


49/130

We can think of as consisting of the set of sample paths

= {Xt :t T}, that is a set of sequences ifT is discrete or aset of functions ifT is continuous. Each has a value ateach time point t T. With this interpretation, For a fixed , we can think oftas a variable, X(t) as a

deterministic function (realization, trajectory, sample path) ofthe process.

If we allow to vary, we get a collection of trajectories.

For fixed t, with varying, we see that Xt() is a randomvariable.

If both and tare fixed, then Xt() is a real number.

Slide 48

Examples of Stochastic Processes


50/130

IfXt is a counting process: For fixed , Xt() is a non-decreasing step function oft.

For fixed t, Xt() is a non-negative integer-valued randomvariable.

For s


51/130

Knowing just theone-dimensional(individual) distributions ofXtfor all t is not enough to describe a stochastic process.To specify the complete distribution of a stochastic process

{Xt, t

T

}, we need to know the finite-dimensional distributions

that is the family of joint distribution functionsFt1,t2, ,tk(x1, , xk) ofXt1 , , Xtk for all k 1 andt1, , tk T.

Slide 50

Discrete-Time Markov Chains


52/130

We are frequently interested in applications where we have asequence X1, X2, of outputs (which we model as randomvariables) in discrete time. For example,

DNA: A (adenine), C (cytosine), G (guanine), T (thymine).

Texts: Xj takes values in some alphabet, for example{A, B, , Z, a, }. Developing and testing compression software. Cryptology: codes, encoding and decoding. Attributing manuscripts.

Slide 51

Independence?


53/130

Is it reasonable to assume that neighbouring letters areindependent?

Text T = {a1 an} of length n. Let n= #{i n:ai =}, nj= #{i n 1 :aiai+1 =j}. Assuming that T is random, we expect n/n P(letter= )

and nj/n

P(two letters=j).

If letters were independent, we haveP(two letters=j) =P(letter=)P(letter=j) so we wouldexpect that nj/n n/n nj/n.

However, let = j=a, P(letter= a)

0.08, but aa is very

rare.We conclude that assuming independence does not lead to a goodmodel for text.

Slide 52

The Markov Property


54/130

The Markov property embodies a natural first generalisation to theindependence assumption. It assumes a kind of one-stepdependence or memory. Specifically, for all Borel sets B,

P(Xn+1 B|Xn =xn, Xn1 =xn1, ) =P(Xn+1 B|Xn =xn)

Slide 53



55/130

A random sequence

{Xn, n

0

}with a countable state space

(without loss{1, 2, }) forms a DTMC ifP(Xn+1 =k|Xn =j,Xn1 =xn1, ,X0 =x0) =P(Xn+1=k|Xn =j).

This enables us to write

P(Xn+1 =k|Xn =j) =pjk(n).Furthermore, we commonly assume that the transition probabilitiespjk(n) do not depend on n, in which case the DTMC is calledhomogeneous(more preciselytime homogeneous) and we write

pjk(n) =pjk.

Slide 54


56/130



57/130

We can associate a directed graph with a DTMC by letting thenodes correspond to states and putting in an arc jk ifpjk>0.

Slide 56



58/130

For a transition matrix of a DTMC:

Each entry is 0. Each row sums to 1.

Any square matrix having these two properties is called astochastic matrix.

Slide 57



59/130

Examples:

If the{

Xn}

are independent and identically-distributedrandom variables with P(Xi=k) =pk, what is the transitionmatrix of the DTMC?

A communication system transmits the digits 0 and 1. Ateach time point, there is a probability pthat the digit will not

change and prob 1 pit will change.

Slide 58



60/130

Suppose that whether or not it rains tomorrow depends on

previous weather conditions only through whether or not it israining today and not on past weather conditions. Supposealso that if it rains today, then it will rain tomorrow withprobability pand if it does not rain today, then it will raintomorrow with probability q. If we say that the process is instate 0 when it rains and state 1 when it does not rain, thenthe above is a two-state Markov chain.

A simple random walk. Let a sequence of random variables{Xn} Zbe defined by Xn+1 =Xn+Yn+1, where{Yn} areindependent and identically-distributed random variables withP(Yn = 1) = p, P(Yn = 1) = 1 p.

Slide 59



61/130

The n-step transition probabilities P(Xm+n =j|Xm=i) of ahomogeneous DTMC do not depend on m. For n= 1, 2, , wedenote them by

p(n)ij =P(Xm+n =j

|Xm =i).

It is also convenient to use the notation

p(0)ij :=

1 if j=i0 if j=i.

Slide 60


TheChapman-Kolmogorov equationsshow how we can calculate


62/130

p g q

the p(n)ij from thepij. For n= 1, 2, and any r= 1, 2, , n,

p(n)ij =

k

p(r)ik p

(nr)kj .

Slide 61


If d fi h


63/130

If we define the n-step transition matrix as

P(n) =

p(n)11 p(n)12 . . . . . .

p(n)21 p

(n)22 p

(n)23

. . .. . .

. . . . . .

. . .

,

then the Chapman-Kolmogorov equations can be written in thematrix form

P(n) =P(r)P(nr)

with P(1) =P. By mathematical induction, it follows that

P(n) =Pn,

the nth power ofP.

Slide 62



64/130

How do we determine the distribution of a DTMC?We have

the initial distribution 0 = (01, . . . , 0n), where

0j =P(X0 =j), for all j, and

the transition matrix P.

In principle, we can use these and the Markov property to derivethe finite dimensional distributions, although the calculations arefrequently intractable.For k 1 and t1 <


65/130

Example

Suppose P(X0 = 1) = 1/3, P(X0 = 2) = 0, P(X0 = 3) = 1/2,P(X0 = 4) = 1/6 and

P=

1/4 0 1/4 1/21/4 1/4 1/4 1/4

0 0 2/3 1/31/2 0 1/2 0

.

Find the distribution ofX1,

Calculate P(Xn+2= 2|Xn = 4), and Calculate P(X3 = 2, X2 = 3, X1 = 1).

Slide 64



66/130

Fundamental questions that we quite often want to ask are

What proportion of time does the chain spend in each state inthe long run?

Or does this even make sense?

The answer depends on theclassification of states.

Slide 65



67/130

Here are some definitions.

State k isaccessiblefrom state j, denoted by j k, if thereexists an n 1 such that p(n)jk >0. That is, there exists apath j=i0, i1, i2, , in =k such that pi0i1 pi1i2 pin1in >0.

If j k and k j, then states j and k communicate,denoted by j k.

State j is callednon-essentialif there exists a state k suchthat j k but k j.

State j is calledessentialif j k implies that k j.

A state j is anabsorbingstate ifpjj= 1. An absorbing stateis essential but essential states do not have to be absorbing.

Slide 66



68/130

ExampleDraw a transition diagram and then classify the states of a DTMCwith transition matrix

P=

0 0.5 0.5 0

0.5 0 0 0.50 0 0.5 0.50 0 0.5 0.5

Slide 67



69/130

A state jwhich is such that j

j is calledephemeral. Ephemeralstates usually dont add anything to a DTMC model and we aregoing to assume that there are no such states.With this assumption, the communication relationhas theproperties

j j(reflexivity), j k if and only ifk j (symmetry), and if j k and k i, then j i(transitivity).

A relation that satisfies these properties is known as anequivalence

relation.

Slide 68



70/130

Consider a set Swhose elements can be related to each other viaany equivalence relation. Then Scan bepartitionedinto acollection of disjoint subsets S1, S2, S3, . . . SM (where Mmight be

infinite) such that j, k Sm implies that j k.So the state space of a DTMC is partitioned into communicatingclassesby the communication relation.

Slide 69



71/130

An essential state cannot be in the same communicating class as a

non-essential state. This means that we can divide the sets in thepartitionS1, S2, S3, . . . SM into a collection ofSn1 , S

n2 , S

n3 , . . . S

nMn

ofnon-essential communicating classes and a collection ofSe1 , S

e2 , S

e3 , . . . S

eMe

of essential communicating classes.If the DTMC starts in a state from a non-essential communicating

class Snm then once it leaves, it can never return. On the otherhand, if the DTMC starts in a state from a essentialcommunicating classSem then it can never leave it.Definition:If a DTMC has only one communicating class, that is all states

communicate with each other, then it is called an irreducibleDTMC.

Slide 70



72/130

ExampleClassify the states of the DTMC with

P=

0.5 0.5 0 0

0.5 0.5 0 00.25 0.15 0.45 0.150 0 0 1

Slide 71



73/130

ExerciseClassify the states of the DTMC with

P=

0 0 + 0 0 0 +0 + 0 + 0 0 ++ 0 0 0 0 0 00 0 0 + 0 0 00 + 0 0 0 0 00 + 0 0 + + 00 0 + 0 0 0 +

Slide 72



74/130

Now lets revisit the random walk example where

Xn+1 =Xn+Yn+1, with{Yn} independent andidentically-distributed with X0 = 0, P(Yn = 1) =pandP(Yn = 1) = 1 p=q. This DTMC is irreducible and so allstates are essential. However,

ifp>q, then E(Xn X0) =n(p q)> 0, so Xn will drift toinfinity, at least in expectation. For each fixed state j, with probability one, the DTMC will

visit jonly finitely many times.

A states long run essentialness is not captured by being essentialin this case: we need a further classification of the states.

Slide 73

Recurrence and Transience of States


75/130

Let X0 =j and Ti(j) be the time between the ith and (i

1)st

return to state j. Then T1(j), T2(j), . . . are independent andidentically distributed random variables.

Slide 74

Recurrence and Transience of States


76/130

Our further classification relies on calculating the probability thatthe DTMC returns to a state once it has left.Let

fj=P(Xn =j for some n>0|X0 =j) =P(T1(j)< |X0 =j).The state j is said to berecurrentif fj= 1 andtransientiffj


77/130

If the DTMC starts in a recurrent state jthen, with probabilityone, it will eventually re-enter j. At this point, the process will

start anew (by the Markov property) and it will re-enter again withprobability one. So the DTMC will (with probability one) visit jinfinitely-many times.If the DTMC starts in a transient state j then there is a probability1

fj>0 that it will never return. So, letting Njbe the number ofvisits to state jafter starting there, we see that Njhas a geometricdistribution.Specifically, for n 0,

P(Nj=n|X

0=j) =P(T

1(j)0. Then

p(s+n+t)jj = P(Xs+t+n =j|X0 =j)

P(Xs+t+n =j, Xs+n =k, Xs=k|X0 =j)= p

(s)jk p

(n)kkp

(t)kj .

Similarly p(s+n+t)kk p(t)kj p(n)jj p(s)jk and so, for n>s+t,

p(nst)jj p(n)kk p(n+s+t)jj /

where =p(s)jk p(t)kj . So the series

n=1p

(n)kk must diverge because

n=1p(n)jj diverges, and we conclude that statek is also recurrent.

Slide 78


80/130

If the Markov chain is irreduciblethen all states are eitherrecurrent or transient and so its appropriate to refer to the chainas eitherrecurrentortransient.

Slide 79

The Random Walk


81/130

Let Xn+1 =Xn+Yn+1 where{Yn :n 1} are independent andidentically-distributed random variables with P(Yn = 1) =pandP(Yn = 1) = 1 p=q.We can compute the m-step transition probabilities from state j toitself by observing that these probabilities are zero ifm is odd and

equal to 2n

n

pnqn

ifm= 2n.

Slide 80

The Random Walk


82/130

Stirlings formula n! 2nnnen gives us the fact that

p(2n)jj

(4pq)nn

,

and the series

n=1p(2n)jj

diverges ifp=q= 1/2, so the DTMC is recurrent,

converges ifp=q(compare to geometric series), so theDTMC is transient.

Slide 81

Periodicity


83/130

The Polya random walk illustrates another phenomenon that canoccur in DTMCs - periodicity.

Definition: State j isperiodicwithperiod d>1 if{n:p(n)jj >0} is non-empty and has greatest common divisor d.

If state jhas period 1, then we say that it isaperiodic.

Slide 82


Examples


84/130

Examples

The Polya random walk has period d= 2 for all states j. What is the period of the DTMC with

P=

0 0.5 0.51 0 0

1 0 0

?

Find the period for the DTMC with

P=

0 0 0.5 0.51 0 0 00 1 0 0

0 1 0 0

.

Slide 83

States in a communicating class have same period

A h j h i d d d j k Th b f


85/130

Assume that state jhas period dj and j k. Then, as before,there must exist sand tsuch that p

(s)jk >0 and p

(t)kj >0. We know

straightaway that dj divides s+tsince it is possible to go from jto itself in s+t steps.Now take a path from kto itself in rsteps. If we concatenate ourpath from j to k in s steps, this rstep path, and our path from

from k to j in tsteps, we have an s+r+tstep path from j toitself. So dj divides s+r+twhich means that dj divides r. Sothe djdivides the period dk ofk.Now we can switch j and kin the argument to conclude that dkdivides d

jwhich means that d

j=d

k, and all states in the same

communicating class have a common period.

Slide 84


The arguments on the preceding slides bring us to the following


86/130

The arguments on the preceding slides bring us to the followingtheorem, which discusses somesolidarityproperties of states in thesame communicating class.

Theorem: In any communicating class Srof a DTMC withstate space S, the states are

either all recurrent or all transient, and either all aperiodic or all periodic with a common period

d>1.

If states from Srare periodic with period d>1, then

Sr =S(1)

r S(2)

r S(d)

r where the DTMC passes fromthe subclass S

(i)r to S

(i+1)r with probability one at a transition.

Slide 85


Examples:


87/130

pd= 4:

Slide 86



88/130

How do we analyse a DTMC? Draw a transition diagram.

Consider the accessibility of states, then divide the state spaceinto essential and non-essential states.

Define the communicating classes, and divide them intorecurrent and transient communicating classes.

Decide whether the classes are periodic.

Slide 87



89/130

Exercises

Analyse the DTMC with P=

0 0.5 0.51 0 0

1 0 0

.

Consider a DTMC with P=

0 0 0.5 0.51 0 0 00 1 0 00 1 0 0

.

Slide 88



90/130

ExampleAnalyse the Markov chain with states numbered 1 to 5 and withone-step transition probability matrix

P=

1 0 0 0 0

1/2 0 1/2 0 00 1/2 1/2 0 00 0 0 0 10 0 0 1 0

Slide 89

Finite State DTMCs have at least one recurrent state


91/130

Recall that a state j is transient if and only ifn=1

p(n)jj =

n=1

E[I(Xn =j)|X0 =j]< .

This means that the DTMC visits jonly finitely-many times (withprobability one), given that it starts there.Let Sbe the set of states, and fj,kbe the probability that theDTMC ever visits state k, given that it starts in state j.

Slide 90

Finite State DTMCs have at least one recurrent stateIf all states k Sare transient, then it must be the case that

( )

( )


92/130

n=1

p(n)jj +

k=j

fj,kn=1

p(n)kk 1/2 hasall statestransient.

Slide 92

Recurrence in Infinite State DTMC


94/130

In order to be able to tell whether a class is recurrent, we need tobe able to calculate the probability of return for at least one state.

Lets label this state 0 and denote by fj,0 the probability that theDTMCever reaches state 0, given that it starts in state j. Then

we see that the sequence{fj,0} satisfies the equationfj,0 =pj0+

k=0

pjkfk,0. ()

Slide 93

Solving the equation ()We illustrate how to solve this equation by


95/130

Example: Consider a random walk on the nonnegative integers:

pj,j+1 =p= 1 pj,j1, for j>0,

and

p0,1 =p= 1 p0,0.(And pij= 0 otherwise.) Equation ()says that for j>1,

fj,0 =pfj+1,0+ (1 p)fj1,0

and, for j= 0, 1,fj,0 =pfj+1,0+ (1 p).

Slide 94

Solving the equation ()The first equation is a second-order linear difference equation withconstant coefficients


96/130

constant coefficients.

These can be solved in a similar way to second-order lineardifferential equations with constant coefficients, which you learnedabout in Calculus II or accelerated Mathematics II.Recall that, to solve

a d2ydt2

+bdydt

+cy= 0,

we try a solution of the form y=y(t) =et to obtain theCharacteristic Equation

a2 +b +c= 0.

Slide 95

Solving the equation ()


97/130

If the characteristic equation has distinct roots, 1 and 2, thegeneral solution has the form

y=Ae1t +Be2t.

If the roots are coincident, the general solution has the form

y=Ae1t +Bte1t.

In both cases, the values of the constants A and Bare determinedby the initial conditions.

Slide 96



98/130

The method for solving second-order linear difference equationwith constant coefficients is similar. To solve

auj+1+buj+cuj1 = 0,

we try a solution of the form u=mj to obtain theCharacteristicEquation

am2 +bm+c= 0.

Slide 97



99/130

If this equation has distinct roots, m1 and m2, the general solutionhas the form

y=Amj1+Bmj2.

If the roots are coincident, the general solution has the form

y=Amj1+Bjmj1.

The values of the constants A and Bneed to be determined byboundary equations, or other information that we have.

Slide 98


100/130

Solving the equation ()If(1 p)/p>1, then the general solution is


101/130

fj,0 =A+B

1 pp

j.

Similarly, if(1 p)/p= 1, the general solution is of the form

fj,0 =A+Bj.

In either case, these can only be probabilities ifB= 0 and thennotice

A=f1,0 =pf2,0+ (1 p) =pA+ (1 p),so A= 1. This makes sense because p 1/2 and so we have aneutral or downward drift.

Slide 100



102/130

However if (1 p)/p


103/130

gj,0 =pj0+ k=0

pjkgk,0.

We show by induction that fj,0(m) gj,0 for all m. Clearly this istrue for m= 1. Assume that it is true for m=. Then

fj,0( + 1) = pj0+k=0

pjkfk,0()

pj0+k=0

pjkgk,0

= gj,0.

It follows that fj,0 = limmfj,0(m) gj,0 and so{fj,0} is theminimal nonnegative solutionto ().

Slide 102


For the random walk with (1 p)/p


104/130

( p)/p g

j 1 was of the form

fj,0 =A+B

1 p

p

j.

The minimal nonnegative solution for j>0 is

fj,0 =

1 p

p

j.

and f0,0 = 2(1 p).

Slide 103

The Gamblers Ruin Problem

Denote the initial capital of a gambler by N.

Th bl ill l i if h / h i $M l


105/130

The gambler will stop playing if he/she wins $Mor loseshis/her initial stake of $N.

There is a probability pthat the gambler wins $1 and aprobability 1 pthat he/she loses $1 on each game.

We assume that the outcomes of successive plays are

independent.

This is a simple DTMC with a finite state space{N, . . . , M}and transition probabilities pj,j+1 =pand pj,j1 = 1 p forj {N+ 1, . . . , M 1}, and pN,N=pM,M= 1.

The gambler would like to know the probability that he/she willwin $Mbefore becoming bankrupt.

Slide 104


106/130

The Gamblers Ruin ProblemThe upper boundary condition gives us

A = B1 p M


107/130

A=

Bp ,

and the lower boundary condition gives us

B= 1 p

pN

1

p

pM

1

,

So the general solution is

fj,N=1p

pj

1p

pM

1pp

N 1pp

M.

Slide 106

The Gamblers Ruin ProblemWhen p= 1/2, the general solution to the first equation is

fj,N=A+Bj.


108/130

The upper boundary condition gives us

A= BM,

and the lower boundary condition gives us

B= 1M+N

,

So the general solution is

fj,N= MjM+N

.

Slide 107

The Gamblers Ruin Problem


109/130

The expected gain is E(G) =M (N+M)f0,NHere are some numbers:

IfN= 90, M= 10 and p= 0.5 then f0,N= 0.1.

IfN= 90, M= 10 and p= 0.45 then f0,N= 0.866. IfN= 90, M= 10 and p= 0.4 then f0,N= 0.983.

IfN= 99, M= 1 and p= 0.4 then f0,N= 0.333.

Slide 108

Long run behaviour of DTMCs


110/130

We want to know the proportion of time a DTMC spends in eachstate over the long run (if this concept makes sense) which should

be the same as the limiting probabilitieslimnp(n)kj .

These will be zero for transient states and non-essential states.

For an irreducible and recurrent DTMC, we will see that theselimiting probabilities exist and are even independent ofk.

Slide 109


Recall that we used Ti(j) to denote the time between the ith and


111/130

(i 1)st return to state j. We then defined state j(and hence itscommunicating class) to be

transientifTi(j) = with positive probability, and recurrentifTi(j)< with probability one.

There is a further classification of recurrent states. Specifically, j is null-recurrentifE[Ti(j)] = , and positive-recurrentifE[Ti(j)]< .

This classification is important for the calculation of the limiting

probabilities.

Slide 110



112/130

Examples The symmetric random walk with p=q= 1/2. For all j,

Ti(j)< with probability one, but E[Ti(j)] = . That is,all states are null-recurrent.

A finite irreducible DTMC: E[Ti(j)]< for all j.

Slide 111


In the long run, how often does a DTMC visit a state j?Let j E[T1(j)|X0 =j]< By the Law of Large Numbers,T1(j) +T2(j) + +Tk(j) jk. So there are approximately k


113/130

( ) ( )

( )

j

visits in kjtime-steps, and the relative frequency of visits to j is1/j. This leads us to

Theorem: If j is an aperiodic state in apositive recurrentcommunicating class: j=E[T1(j)

|X0 =j]1, which is

t ith fi it t d ti th f


127/130

recurrent with a finite expected recurrence time, then for0 k d 1,{Xnd+k|X0 Sr} is an ergodic DTMC withstate space S

(k)r .

For any and k= 0, 1, , d 1,

P(Xnd+k=j|X0 S()r )

(r)j as n for j S(+k(mod d))r= 0 for j S(+k(mod d))r

so jS()r (r)j = 1 for any

Slide 126


ExampleClassify the DTMC with


128/130

P=

0 0 0.1 0.9 0 0 01 0 0 0 0 0 00 1 0 0 0 0 0

0 1 0 0 0 0 00 0 0 0 1323 0

0 0 0 0 1545 0

14

14 0 0

18

14

18

and discuss its properties.

Slide 127

Good Trick

Sometimes we want to model a physical system where the futuredepend on part of the past. Consider following example. A

seq ence of random ariables {Xn} describes the eather at a


129/130

sequence of random variables{Xn} describes the weather at aparticular location, with Xn = 1 if it is sunny and Xn = 2 if it israiny on day n.Suppose that the weather on day n+ 1 depends on the weatherconditions on days n

1 and n as is shown below:

P(Xn+1 = 2|Xn =Xn1 = 2) = 0.6P(Xn+1 = 1|Xn =Xn1 = 1) = 0.8P(Xn+1 = 2|Xn = 2, Xn1 = 1) = 0.5

P(Xn+1 = 1|Xn = 1, Xn1 = 2) = 0.75

Slide 128

Good Trick

If we put Yn (Xn1 Xn) then Yn is a DTMC The possible ( ) ( ) ( ) ( )


130/130

If we put Yn = (Xn1, Xn), then Yn is a DTMC. The possiblestates are 1 = (1, 1), 2 = (1, 2), 3 = (2, 1) and 4 = (2, 2).We see that{Yn : n 1} is a DTMC with transition matrix

P=

0.8 0.2 0 0

0 0 0.5 0.50.75 0.25 0 00 0 0.4 0.6

.

Slide 129

stochastic modelling notes discrete

Documents