stochastic modelling notes discrete
TRANSCRIPT
-
8/9/2019 Stochastic modelling Notes Discrete
1/130
MAST30001 Stochastic Modelling
Lecturer: Nathan Ross
-
8/9/2019 Stochastic modelling Notes Discrete
2/130
Administration
LMS - announcements, grades, course documents
Lectures/Practicals Student-staff liaison committee (SSLC) representative
Slide 1
-
8/9/2019 Stochastic modelling Notes Discrete
3/130
Modelling
We develop an imitation of the system. It could be, for example, a small replica of a marina development,
a set of equations describing the relations between stockprices,
a computer simulation that reproduces a complex system(think: the paths of planets in the solar system).
We use a model
to understand the evolution of a system,
to understand how outputs relate to inputs, and to decide how to influence a system.
Slide 2
-
8/9/2019 Stochastic modelling Notes Discrete
4/130
Why do we model?
We want to understand how a complex system works. Real-worldexperimentation can be
too slow,
too expensive,
possibly too dangerous,
may not deliver insight.
The alternative is to build a physical, mathematical orcomputational model that captures the essence of the system that
we are interested in (think: NASA).
Slide 3
-
8/9/2019 Stochastic modelling Notes Discrete
5/130
Why a stochastic model?
We want to model such things as
traffic in the Internet
stock prices and their derivatives
waiting times in healthcare queues reliability of multicomponent systems
interacting populations
epidemics
where the effects of randomness cannot be ignored.
Slide 4
-
8/9/2019 Stochastic modelling Notes Discrete
6/130
Good mathematical models
capture the non-trivial behaviour of a system,
are as simple as possible,
replicate empirical observations, are tractable - they can be analysed to derive the quantities of
interest, and
can be used to help make decisions.
Slide 5
-
8/9/2019 Stochastic modelling Notes Discrete
7/130
Stochastic modelling
Stochastic modelling is about the study of random experiments.
For example, toss a coin once, toss a coin twice, toss a coin infinitely-many
times
the lifetime of a randomly selected battery (quality control)
the operation of a queue over the time interval [0, ) (service) the changes in the US dollar - Australian dollar exchange rate
from 2006 onwards (finance)
the positions of all iphones that make connections to a
particular telecommunications company over the course of onehour (wireless tower placement)
the network friend structure of Facebook (ad revenue)
Slide 6
-
8/9/2019 Stochastic modelling Notes Discrete
8/130
Stochastic modelling
We study a random experiment in the context of a ProbabilitySpace (, F, P). Here, thesample space is the set of all possible outcomes of our
random experiment, theclass of eventsF is a set of subsets of . We view these
as events we can seeormeasure, and
P is aprobability measuredefined on the elements ofF.
Slide 7
-
8/9/2019 Stochastic modelling Notes Discrete
9/130
The sample space
We need to think about the sets of possible outcomes for therandom experiments. For those discussed above, these could be
{H, T},{(H, H), (H, T), (T, H), (T, T)}, the set of allinfinite sequences ofHs and Ts.
[0,
).
the set of piecewise-constant functions from [0, ) toZ+. the set of continuous functions from [0, ) to IR+.
n=0{(x1, y1) . . . (xn, yn)}, giving locations of the phones
when they connected.
Set of simple networks with number of vertices equal to thenumber of users: edges connect friends.
Slide 8
-
8/9/2019 Stochastic modelling Notes Discrete
10/130
Review of basic notions of set theory
A B. A is asubsetofBor ifA occurs, then Boccurs.
A B= { : A or B} =B A. Unionof sets (events): at least one occurs.
A1 A2 An = ni=1Ai. A B= { : A and B} =B A= AB.
Intersectionof sets (events): both occur. A1 A2 An =
ni=1Ai.
Ac =
{
:w
A
} Complementof a set/event: event doesnt occur.: theempty set or impossible event.
Slide 9
-
8/9/2019 Stochastic modelling Notes Discrete
11/130
The class of eventsF
For discrete sample spaces,F is typically the set of all subsets.
Example: Toss a coin once,F= {, {H}, {T}, {H, T}} For continuous state spaces, the situation is more complicated:
Slide 10
-
8/9/2019 Stochastic modelling Notes Discrete
12/130
The class of eventsF
Sequals circle ofradius1.
We say two points on Sare in the samefamilyif you can getfrom one to the other by taking steps of arclength 1 around
the circle. Eachfamilychooses a single member to behead.
IfX is a point chosen uniformly at random from the circle,what is the chance X is the head of its family?
Slide 11
-
8/9/2019 Stochastic modelling Notes Discrete
13/130
The class of eventsF
A= {X is head of its family}. Ai= {X is isteps clockwise from its family}. Bi= {X is i steps counterclockwise from its family}. By uniformity, P(A) =P(Ai) =P(Bi),BUT
law of total probability:
1 =P(A) +i=1
(P(Ai) +P(Bi))!
The issue is that the event A is not one we can seeormeasuresoshould not be included inF.
Slide 12
-
8/9/2019 Stochastic modelling Notes Discrete
14/130
The class of eventsF
These kinds of issues are technical to resolve and are dealt with in
later probability or analysis subjects which use measure theory.
Slide 13
-
8/9/2019 Stochastic modelling Notes Discrete
15/130
The probability measure P
The probability measure Pon (, F) is a set function fromFsatisfying
P1. P(A) 0 for all A F [probabilities measure long run %s orcertainty]
P2. P() = 1 [There is a 100% chance something happens]
P3. Countable additivity: ifA1, A2 are disjoint events inF,then P(
i=1Ai) =
i=1P(Ai) [Think about it in terms of
frequencies]
Slide 14
-
8/9/2019 Stochastic modelling Notes Discrete
16/130
How do we specify P?
The modelling process consists of defining the values ofP(A) for some basic events in A F, deriving P(B) for the other unknown more complicated
events in B Ffrom the axioms above.
Example: Toss a fair coin 1000 times. Any length 1000 sequenceof Hs and Ts has chance 21000.
What is the chance there are more than 600 Hs in thesequence?
What is the chance the first time the proportion of headsexceeds the proportion of tails occurs after toss 20?
Slide 15
-
8/9/2019 Stochastic modelling Notes Discrete
17/130
Properties ofP
P() = 0.
P(Ac
) = 1 P(A). P(A B) =P(A) +P(B) P(A B).
Slide 16
C
-
8/9/2019 Stochastic modelling Notes Discrete
18/130
Conditional probability
Let A, B Fbe events with P(B)> 0. Supposing we know thatB
occurred, how likely isA
given that information? That is, whatis theconditional probability P(A|B)?For a frequency interpretation, consider the situation where wehave n trials and Bhas occurred nB times. What is the relativefrequency ofA in these nB trials? The answer is
nAB
nB=
nAB/n
nB/n P(A B)
P(B) .
Hence, we define
P(A|B) = P(A
B)
P(B) .
We need a more sophisticated definition if we want to define theconditional probability P(A|B) when P(B) = 0.
Slide 17
E l
-
8/9/2019 Stochastic modelling Notes Discrete
19/130
Example:
Tickets are drawn consecutively and without replacementfrom abox of tickets numbered 1 10. What is the chance the secondticket is even numbered given the first is
even?
labelled 3?
Slide 18
B f l
-
8/9/2019 Stochastic modelling Notes Discrete
20/130
Bayes formula
Let B1, B2, , Bn be mutually exclusive events with A nj=1Bj,
then
P(A) =n
j=1P(A|Bj)P(Bj).
With the same assumptions as for the Law of Total Probability,
P(Bj|A) = P(Bj A)P(A)
= P(A|Bj)P(Bj)
nk=1P(A|Bk)P(Bk)
.
Slide 19
E l
-
8/9/2019 Stochastic modelling Notes Discrete
21/130
Example:
A disease affects 1/1000 newborns and shortly after birth a baby isscreened for this disease using a cheap test that has a 2% false
positive rate (the test has no false negatives). If the baby testspositive, what is the chance it has the disease?
Slide 20
I d d t t
-
8/9/2019 Stochastic modelling Notes Discrete
22/130
Independent events
Events A and Bare said to beindependentifP(A B) =P(A)P(B).IfP(B) = 0 or P(A) = 0 then this is the same as P(A|B) =P(A)and P(B
|A) =P(B).
Events A1, , An are independent if, for any subset{i1, ..., ik} of{1, ..., n},
P(Ai1 Aik) =P(Ai1 ) P(Aik).
Slide 21
Ra do a iables
-
8/9/2019 Stochastic modelling Notes Discrete
23/130
Random variables
Arandom variable(rv) on a probability space (, F, P) is afunction X : IR.Usually, we want to talk about the probabilities that the values ofrandom variables lie in sets of the form (a, b) = {x :a
-
8/9/2019 Stochastic modelling Notes Discrete
24/130
Distribution Functions
The function FX(t) =P(X t) =P({:X() (, t]} thatmaps R to [0, 1] is called thedistribution functionof the randomvariableX.Any distribution function F
F1. is non-decreasing,
F2. is such that F(x) 0 as x and F(x) 1 as x ,and
F3. is right-continuous, that is limh0+F(t+ h) =F(t) for all t.
Slide 23
Distribution Functions
-
8/9/2019 Stochastic modelling Notes Discrete
25/130
Distribution Functions
We say that the random variable X isdiscreteif it can take only
countably-many values, with P(X =xi) =pi >0 and
ipi= 1. Its distribution function FX(t) is commonly a stepfunction.
the random variable X iscontinuousifFX(t) isabsolutelycontinuous, that is if there exists a function fX(t) that mapsR to R+ such that FX(t) =
t
fX(u)du.
Amixedrandom variable has some points that have positive
probability and also some continuous parts.
Slide 24
Examples of distributions
-
8/9/2019 Stochastic modelling Notes Discrete
26/130
Examples of distributions
Examples of discrete random variables: binomial, Poisson,geometric, negative binomial, discrete uniformhttp://en.wikipedia.org/wiki/Category:
Discrete_distributions
Examples of continuous random variables: normal,exponential, gamma, beta, uniform on an interval (a, b)http://en.wikipedia.org/wiki/Category:
Continuous_distributions
Slide 25
Random Vectors
http://en.wikipedia.org/wiki/Category:Discrete_distributionshttp://en.wikipedia.org/wiki/Category:Discrete_distributionshttp://en.wikipedia.org/wiki/Category:Continuous_distributionshttp://en.wikipedia.org/wiki/Category:Continuous_distributionshttp://en.wikipedia.org/wiki/Category:Continuous_distributionshttp://en.wikipedia.org/wiki/Category:Continuous_distributionshttp://en.wikipedia.org/wiki/Category:Discrete_distributionshttp://en.wikipedia.org/wiki/Category:Discrete_distributions -
8/9/2019 Stochastic modelling Notes Discrete
27/130
Random Vectors
Arandom vector X= (X1, ..., Xd) is a measurable mapping of(, F) to IRd, that is, for each Borel set B IRd,{:X() B} F.The distribution function of a random vector is
FX(t) =P(X1 t1, , Xd td), t= (t1, , td) Rd
.
It follows that
P(s1
-
8/9/2019 Stochastic modelling Notes Discrete
28/130
Independent Random Variables
The random variables X1, , Xdare calledindependentifFX(t) =FX1 (t1) FXd(td) for all t= (t1, , td).Equivalently,
the events{X1 B1}, ,{Xd Bd} are independent for allBorel sets B1, , Bd R, or, in the absolutely-continuous case,
fX(t) =fX1 (t1) fXd(td) for all t= (t1, , td).
Slide 27
Revision Exercise
-
8/9/2019 Stochastic modelling Notes Discrete
29/130
Revision Exercise
For bivariate random variables (X, Y) with density functions
f(x, y) = 2x+ 2y 4xy for 0< x
-
8/9/2019 Stochastic modelling Notes Discrete
30/130
Expectation ofX
For a discrete, continuous or mixed random variable X that takeson values in the set SX, theexpectationofX is
E(X) =
SX
xdFX(x)
The integral on the right hand side is a Lebesgue-Stieltjes integral.
It can be evaluated as
=
i=1xiP(X =xi), ifX is discrete
xfX(x)dx, ifX is absolutely continuous.
In second year, we required that the integral be absolutelyconvergent. In this course we will allow the expectation to beinfinite, provided that we never get in a situation where we have .
Slide 29
Expectation of g(X )
-
8/9/2019 Stochastic modelling Notes Discrete
31/130
Expectation ofg(X)
For a measurable function g that maps SX to some other set SY,Y =g(X) is a random variable taking values in SY and
E(Y) =E(g(X)) =
SX
g(x)dFX(x).
We can also evaluate E(Y) by calculating its distribution functionFY(y) and then using the expression
E(Y) = SY
ydFY(y).
Slide 30
Properties of Expectation
-
8/9/2019 Stochastic modelling Notes Discrete
32/130
Properties of Expectation
E(aX+bY) =aE(X) +bE(Y).
IfX
Y, then E(X)
E(Y).
IfX c, then E(X) =c.
Slide 31
Moments
-
8/9/2019 Stochastic modelling Notes Discrete
33/130
The kthmomentofX is E(Xk).
The kthcentral momentofX is E[(X E(X))k]. Thevariance V(X) ofX is the second central moment
E(X2)
(E(X))2.
V(cX) =c2V(X). IfX and Yhave finite means and are independent, then
E(XY) =E(X)E(Y).
IfX and Yare independent (or uncorrelated), then
V(X Y) =V(X) +V(Y).
Slide 32
Conditional Probability
-
8/9/2019 Stochastic modelling Notes Discrete
34/130
y
Theconditional probability of event A given X is a randomvariable (since it is a function ofX). We write it as P(A|X). for a real number x, ifP(X =x)> 0, then
P(A
|x) =P(A
{X =x
})/P(
{X =x
}).
ifP(X =x) = 0, then
P(A|x) = lim0+
P(A{X (x, x+)})/P({X (x, x+)}).
Slide 33
Conditional Distribution
-
8/9/2019 Stochastic modelling Notes Discrete
35/130
Theconditional distribution function FY|X(y|X) ofYevaluated at the real number y is given by P({Y y}|X),where P(
{Y
y}|
x) is defined on the previous slide.
If (X, Y) is absolutely continuous, then the conditional densityofY given that X =x is fY|X(y|x) =f(X,Y)(x, y)/fX(x).
Slide 34
Conditional Expectation
-
8/9/2019 Stochastic modelling Notes Discrete
36/130
p
Theconditional expectation E(Y|X) =(X) where
(x) =E(Y|X =x)
=
jyjP(Y =yj|X =x) ifY is discreteSY
yfY|X(y|x)dy ifY is absolutely continuous.
Slide 35
Properties of Conditional Expectation
-
8/9/2019 Stochastic modelling Notes Discrete
37/130
Linearity: E(aY1+bY2|X) =aE(Y1|X) +bE(Y2|X), Monotonicity: Y1 Y2, then E(Y1|X) E(Y2|X), E(c|X) =c, E(E(Y
|X)) =E(Y),
For any measurable function g, E(g(X)Y|X) =g(X)E(Y|X) E(Y|X) is the best predictor ofY from X in the mean square
sense. This means that, for all random variables Z =g(X),the expected quadratic error E((g(X) Y)2) is minimisedwhen g(X) =E(Y|X) (see Borovkov, page 57).
Slide 36
Exercise
-
8/9/2019 Stochastic modelling Notes Discrete
38/130
Let = {a, b, c, d}, P({a}) = 12 , P({b}) =P({c}) = 18 andP({d}) = 14 .Define random variables,
Y() = 1, =a orb,
0, =c ord,
X() =
2, =a orc,5, =bord.
Compute E(X), E(X|Y) and E(E(X|Y)).
Slide 37
Example
-
8/9/2019 Stochastic modelling Notes Discrete
39/130
The number of storms, N, in the upcoming rainy season isdistributed according to a Poisson distribution with a parametervalue that is itself random. Specifically, is uniformlydistributed over (0, 5). The distribution ofN is called amixed
Poisson distribution.1. Find the probability there are at least two storms this season.
2. Calculate E(N|) and E(N2|).3. Derive the mean and variance ofN.
Slide 38
Exercise
-
8/9/2019 Stochastic modelling Notes Discrete
40/130
The joint density ofX and Y is given by
fX,Y
(x, y) = ex/yey
y , x>0, y>0.
Calculate E[X|Y] and then calculate E[X].
Slide 39
Limit Theorems (Borovkov2.9)
-
8/9/2019 Stochastic modelling Notes Discrete
41/130
TheLaw of Large Numbers(LLN) states that ifX1, X2, areindependent and identically-distributed with mean , then
Xn 1n
nj=1
Xj
as n .In the strong form, this is truealmost surely, which means that itis true on a set A of sequences x1, x2, . . . that has probability one.In the weak form, this is truein probabilitywhich means that, forall >0,
P(|Xn | > ) 0as n .
Slide 40
Limit Theorems (Borovkov2.9)
-
8/9/2019 Stochastic modelling Notes Discrete
42/130
TheCentral Limit Theorem(CLT) states that ifX1, X2, areindependent and identically-distributed with mean and variance2, then for any x,
PX
n
/n
-
8/9/2019 Stochastic modelling Notes Discrete
43/130
ThePoisson Limit Theoremstates that that ifX1, X2, areindependent Bernoulli random variables withP(Xi= 1) = 1 P(Xi= 0) =pi, then X1+X2+ +Xn iswell-approximated by a Poisson random variable with parameter=p1+ +pn.Specifically, with W =X1+X2+ +Xn, then, for any Borel setB R,
P(W B) P(Y B)where Y Po().There is, in fact, a bound on the accuracy of this approximation
|P(W B) P(Y B)|
ni=1p2i
max(1, ),
Slide 42
Example
-
8/9/2019 Stochastic modelling Notes Discrete
44/130
Suppose there are three ethnic groups, A (20%), B (30%) and C(50%), living in a city with a large population. Suppose 0.5%, 1%and 1.5% of people in A, B and C respectively are over 200cm tall.
If we know that of 300 selected, 50, 50 and 200 people are from A,B and C, what is the probability that at least four will be over 200cm?
Slide 43
Stochastic Processes (Borovkov2.10)
-
8/9/2019 Stochastic modelling Notes Discrete
45/130
A collection of random variables{Xt, t T} (or{X(t), t T})on a common prob space (, F, P) is called astochastic process.The index variable tis often called time.
IfT = {1, 2, } or{ , 2, 1, 0, 1, 2, }, the process isadiscrete time process.
IfT = IR or [0, ), the process is acontinuous time process IfT = IRd, then the process is aspatial process, for example
temperature at t T IR2, which could be, say, theUniversity campus.
Slide 44
Examples of Stochastic ProcessesIf X N(0 1) f ll t th X i ll d G i
-
8/9/2019 Stochastic modelling Notes Discrete
46/130
IfXt N(0, 1) for all t, then Xt is called a Gaussian process.Different processes can be modelled by making differentassumptions about the dependence between the Xt for different t.
Standard Brownian Motion is a Gaussian process where For any 0 s1
-
8/9/2019 Stochastic modelling Notes Discrete
47/130
Xt is the number of sales of an item up to time t.
{Xt, t 0} is called acounting process.
Slide 46
Examples of Stochastic Processes
-
8/9/2019 Stochastic modelling Notes Discrete
48/130
Xt is the number of people in a queue at time t.
Lect 04 620-301 1
Slide 47
Interpretations
-
8/9/2019 Stochastic modelling Notes Discrete
49/130
We can think of as consisting of the set of sample paths
= {Xt :t T}, that is a set of sequences ifT is discrete or aset of functions ifT is continuous. Each has a value ateach time point t T. With this interpretation, For a fixed , we can think oftas a variable, X(t) as a
deterministic function (realization, trajectory, sample path) ofthe process.
If we allow to vary, we get a collection of trajectories.
For fixed t, with varying, we see that Xt() is a randomvariable.
If both and tare fixed, then Xt() is a real number.
Slide 48
Examples of Stochastic Processes
-
8/9/2019 Stochastic modelling Notes Discrete
50/130
IfXt is a counting process: For fixed , Xt() is a non-decreasing step function oft.
For fixed t, Xt() is a non-negative integer-valued randomvariable.
For s
-
8/9/2019 Stochastic modelling Notes Discrete
51/130
Knowing just theone-dimensional(individual) distributions ofXtfor all t is not enough to describe a stochastic process.To specify the complete distribution of a stochastic process
{Xt, t
T
}, we need to know the finite-dimensional distributions
that is the family of joint distribution functionsFt1,t2, ,tk(x1, , xk) ofXt1 , , Xtk for all k 1 andt1, , tk T.
Slide 50
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
52/130
We are frequently interested in applications where we have asequence X1, X2, of outputs (which we model as randomvariables) in discrete time. For example,
DNA: A (adenine), C (cytosine), G (guanine), T (thymine).
Texts: Xj takes values in some alphabet, for example{A, B, , Z, a, }. Developing and testing compression software. Cryptology: codes, encoding and decoding. Attributing manuscripts.
Slide 51
Independence?
-
8/9/2019 Stochastic modelling Notes Discrete
53/130
Is it reasonable to assume that neighbouring letters areindependent?
Text T = {a1 an} of length n. Let n= #{i n:ai =}, nj= #{i n 1 :aiai+1 =j}. Assuming that T is random, we expect n/n P(letter= )
and nj/n
P(two letters=j).
If letters were independent, we haveP(two letters=j) =P(letter=)P(letter=j) so we wouldexpect that nj/n n/n nj/n.
However, let = j=a, P(letter= a)
0.08, but aa is very
rare.We conclude that assuming independence does not lead to a goodmodel for text.
Slide 52
The Markov Property
-
8/9/2019 Stochastic modelling Notes Discrete
54/130
The Markov property embodies a natural first generalisation to theindependence assumption. It assumes a kind of one-stepdependence or memory. Specifically, for all Borel sets B,
P(Xn+1 B|Xn =xn, Xn1 =xn1, ) =P(Xn+1 B|Xn =xn)
Slide 53
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
55/130
A random sequence
{Xn, n
0
}with a countable state space
(without loss{1, 2, }) forms a DTMC ifP(Xn+1 =k|Xn =j,Xn1 =xn1, ,X0 =x0) =P(Xn+1=k|Xn =j).
This enables us to write
P(Xn+1 =k|Xn =j) =pjk(n).Furthermore, we commonly assume that the transition probabilitiespjk(n) do not depend on n, in which case the DTMC is calledhomogeneous(more preciselytime homogeneous) and we write
pjk(n) =pjk.
Slide 54
-
8/9/2019 Stochastic modelling Notes Discrete
56/130
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
57/130
We can associate a directed graph with a DTMC by letting thenodes correspond to states and putting in an arc jk ifpjk>0.
Slide 56
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
58/130
For a transition matrix of a DTMC:
Each entry is 0. Each row sums to 1.
Any square matrix having these two properties is called astochastic matrix.
Slide 57
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
59/130
Examples:
If the{
Xn}
are independent and identically-distributedrandom variables with P(Xi=k) =pk, what is the transitionmatrix of the DTMC?
A communication system transmits the digits 0 and 1. Ateach time point, there is a probability pthat the digit will not
change and prob 1 pit will change.
Slide 58
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
60/130
Suppose that whether or not it rains tomorrow depends on
previous weather conditions only through whether or not it israining today and not on past weather conditions. Supposealso that if it rains today, then it will rain tomorrow withprobability pand if it does not rain today, then it will raintomorrow with probability q. If we say that the process is instate 0 when it rains and state 1 when it does not rain, thenthe above is a two-state Markov chain.
A simple random walk. Let a sequence of random variables{Xn} Zbe defined by Xn+1 =Xn+Yn+1, where{Yn} areindependent and identically-distributed random variables withP(Yn = 1) = p, P(Yn = 1) = 1 p.
Slide 59
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
61/130
The n-step transition probabilities P(Xm+n =j|Xm=i) of ahomogeneous DTMC do not depend on m. For n= 1, 2, , wedenote them by
p(n)ij =P(Xm+n =j
|Xm =i).
It is also convenient to use the notation
p(0)ij :=
1 if j=i0 if j=i.
Slide 60
Discrete-Time Markov Chains
TheChapman-Kolmogorov equationsshow how we can calculate
-
8/9/2019 Stochastic modelling Notes Discrete
62/130
p g q
the p(n)ij from thepij. For n= 1, 2, and any r= 1, 2, , n,
p(n)ij =
k
p(r)ik p
(nr)kj .
Slide 61
Discrete-Time Markov Chains
If d fi h
-
8/9/2019 Stochastic modelling Notes Discrete
63/130
If we define the n-step transition matrix as
P(n) =
p(n)11 p(n)12 . . . . . .
p(n)21 p
(n)22 p
(n)23
. . .. . .
. . . . . .
. . .
,
then the Chapman-Kolmogorov equations can be written in thematrix form
P(n) =P(r)P(nr)
with P(1) =P. By mathematical induction, it follows that
P(n) =Pn,
the nth power ofP.
Slide 62
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
64/130
How do we determine the distribution of a DTMC?We have
the initial distribution 0 = (01, . . . , 0n), where
0j =P(X0 =j), for all j, and
the transition matrix P.
In principle, we can use these and the Markov property to derivethe finite dimensional distributions, although the calculations arefrequently intractable.For k 1 and t1 <
-
8/9/2019 Stochastic modelling Notes Discrete
65/130
Example
Suppose P(X0 = 1) = 1/3, P(X0 = 2) = 0, P(X0 = 3) = 1/2,P(X0 = 4) = 1/6 and
P=
1/4 0 1/4 1/21/4 1/4 1/4 1/4
0 0 2/3 1/31/2 0 1/2 0
.
Find the distribution ofX1,
Calculate P(Xn+2= 2|Xn = 4), and Calculate P(X3 = 2, X2 = 3, X1 = 1).
Slide 64
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
66/130
Fundamental questions that we quite often want to ask are
What proportion of time does the chain spend in each state inthe long run?
Or does this even make sense?
The answer depends on theclassification of states.
Slide 65
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
67/130
Here are some definitions.
State k isaccessiblefrom state j, denoted by j k, if thereexists an n 1 such that p(n)jk >0. That is, there exists apath j=i0, i1, i2, , in =k such that pi0i1 pi1i2 pin1in >0.
If j k and k j, then states j and k communicate,denoted by j k.
State j is callednon-essentialif there exists a state k suchthat j k but k j.
State j is calledessentialif j k implies that k j.
A state j is anabsorbingstate ifpjj= 1. An absorbing stateis essential but essential states do not have to be absorbing.
Slide 66
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
68/130
ExampleDraw a transition diagram and then classify the states of a DTMCwith transition matrix
P=
0 0.5 0.5 0
0.5 0 0 0.50 0 0.5 0.50 0 0.5 0.5
Slide 67
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
69/130
A state jwhich is such that j
j is calledephemeral. Ephemeralstates usually dont add anything to a DTMC model and we aregoing to assume that there are no such states.With this assumption, the communication relationhas theproperties
j j(reflexivity), j k if and only ifk j (symmetry), and if j k and k i, then j i(transitivity).
A relation that satisfies these properties is known as anequivalence
relation.
Slide 68
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
70/130
Consider a set Swhose elements can be related to each other viaany equivalence relation. Then Scan bepartitionedinto acollection of disjoint subsets S1, S2, S3, . . . SM (where Mmight be
infinite) such that j, k Sm implies that j k.So the state space of a DTMC is partitioned into communicatingclassesby the communication relation.
Slide 69
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
71/130
An essential state cannot be in the same communicating class as a
non-essential state. This means that we can divide the sets in thepartitionS1, S2, S3, . . . SM into a collection ofSn1 , S
n2 , S
n3 , . . . S
nMn
ofnon-essential communicating classes and a collection ofSe1 , S
e2 , S
e3 , . . . S
eMe
of essential communicating classes.If the DTMC starts in a state from a non-essential communicating
class Snm then once it leaves, it can never return. On the otherhand, if the DTMC starts in a state from a essentialcommunicating classSem then it can never leave it.Definition:If a DTMC has only one communicating class, that is all states
communicate with each other, then it is called an irreducibleDTMC.
Slide 70
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
72/130
ExampleClassify the states of the DTMC with
P=
0.5 0.5 0 0
0.5 0.5 0 00.25 0.15 0.45 0.150 0 0 1
Slide 71
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
73/130
ExerciseClassify the states of the DTMC with
P=
0 0 + 0 0 0 +0 + 0 + 0 0 ++ 0 0 0 0 0 00 0 0 + 0 0 00 + 0 0 0 0 00 + 0 0 + + 00 0 + 0 0 0 +
Slide 72
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
74/130
Now lets revisit the random walk example where
Xn+1 =Xn+Yn+1, with{Yn} independent andidentically-distributed with X0 = 0, P(Yn = 1) =pandP(Yn = 1) = 1 p=q. This DTMC is irreducible and so allstates are essential. However,
ifp>q, then E(Xn X0) =n(p q)> 0, so Xn will drift toinfinity, at least in expectation. For each fixed state j, with probability one, the DTMC will
visit jonly finitely many times.
A states long run essentialness is not captured by being essentialin this case: we need a further classification of the states.
Slide 73
Recurrence and Transience of States
-
8/9/2019 Stochastic modelling Notes Discrete
75/130
Let X0 =j and Ti(j) be the time between the ith and (i
1)st
return to state j. Then T1(j), T2(j), . . . are independent andidentically distributed random variables.
Slide 74
Recurrence and Transience of States
-
8/9/2019 Stochastic modelling Notes Discrete
76/130
Our further classification relies on calculating the probability thatthe DTMC returns to a state once it has left.Let
fj=P(Xn =j for some n>0|X0 =j) =P(T1(j)< |X0 =j).The state j is said to berecurrentif fj= 1 andtransientiffj
-
8/9/2019 Stochastic modelling Notes Discrete
77/130
If the DTMC starts in a recurrent state jthen, with probabilityone, it will eventually re-enter j. At this point, the process will
start anew (by the Markov property) and it will re-enter again withprobability one. So the DTMC will (with probability one) visit jinfinitely-many times.If the DTMC starts in a transient state j then there is a probability1
fj>0 that it will never return. So, letting Njbe the number ofvisits to state jafter starting there, we see that Njhas a geometricdistribution.Specifically, for n 0,
P(Nj=n|X
0=j) =P(T
1(j)0. Then
p(s+n+t)jj = P(Xs+t+n =j|X0 =j)
P(Xs+t+n =j, Xs+n =k, Xs=k|X0 =j)= p
(s)jk p
(n)kkp
(t)kj .
Similarly p(s+n+t)kk p(t)kj p(n)jj p(s)jk and so, for n>s+t,
p(nst)jj p(n)kk p(n+s+t)jj /
where =p(s)jk p(t)kj . So the series
n=1p
(n)kk must diverge because
n=1p(n)jj diverges, and we conclude that statek is also recurrent.
Slide 78
-
8/9/2019 Stochastic modelling Notes Discrete
80/130
If the Markov chain is irreduciblethen all states are eitherrecurrent or transient and so its appropriate to refer to the chainas eitherrecurrentortransient.
Slide 79
The Random Walk
-
8/9/2019 Stochastic modelling Notes Discrete
81/130
Let Xn+1 =Xn+Yn+1 where{Yn :n 1} are independent andidentically-distributed random variables with P(Yn = 1) =pandP(Yn = 1) = 1 p=q.We can compute the m-step transition probabilities from state j toitself by observing that these probabilities are zero ifm is odd and
equal to 2n
n
pnqn
ifm= 2n.
Slide 80
The Random Walk
-
8/9/2019 Stochastic modelling Notes Discrete
82/130
Stirlings formula n! 2nnnen gives us the fact that
p(2n)jj
(4pq)nn
,
and the series
n=1p(2n)jj
diverges ifp=q= 1/2, so the DTMC is recurrent,
converges ifp=q(compare to geometric series), so theDTMC is transient.
Slide 81
Periodicity
-
8/9/2019 Stochastic modelling Notes Discrete
83/130
The Polya random walk illustrates another phenomenon that canoccur in DTMCs - periodicity.
Definition: State j isperiodicwithperiod d>1 if{n:p(n)jj >0} is non-empty and has greatest common divisor d.
If state jhas period 1, then we say that it isaperiodic.
Slide 82
Discrete-Time Markov Chains
Examples
-
8/9/2019 Stochastic modelling Notes Discrete
84/130
Examples
The Polya random walk has period d= 2 for all states j. What is the period of the DTMC with
P=
0 0.5 0.51 0 0
1 0 0
?
Find the period for the DTMC with
P=
0 0 0.5 0.51 0 0 00 1 0 0
0 1 0 0
.
Slide 83
States in a communicating class have same period
A h j h i d d d j k Th b f
-
8/9/2019 Stochastic modelling Notes Discrete
85/130
Assume that state jhas period dj and j k. Then, as before,there must exist sand tsuch that p
(s)jk >0 and p
(t)kj >0. We know
straightaway that dj divides s+tsince it is possible to go from jto itself in s+t steps.Now take a path from kto itself in rsteps. If we concatenate ourpath from j to k in s steps, this rstep path, and our path from
from k to j in tsteps, we have an s+r+tstep path from j toitself. So dj divides s+r+twhich means that dj divides r. Sothe djdivides the period dk ofk.Now we can switch j and kin the argument to conclude that dkdivides d
jwhich means that d
j=d
k, and all states in the same
communicating class have a common period.
Slide 84
Discrete-Time Markov Chains
The arguments on the preceding slides bring us to the following
-
8/9/2019 Stochastic modelling Notes Discrete
86/130
The arguments on the preceding slides bring us to the followingtheorem, which discusses somesolidarityproperties of states in thesame communicating class.
Theorem: In any communicating class Srof a DTMC withstate space S, the states are
either all recurrent or all transient, and either all aperiodic or all periodic with a common period
d>1.
If states from Srare periodic with period d>1, then
Sr =S(1)
r S(2)
r S(d)
r where the DTMC passes fromthe subclass S
(i)r to S
(i+1)r with probability one at a transition.
Slide 85
Discrete-Time Markov Chains
Examples:
-
8/9/2019 Stochastic modelling Notes Discrete
87/130
pd= 4:
Slide 86
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
88/130
How do we analyse a DTMC? Draw a transition diagram.
Consider the accessibility of states, then divide the state spaceinto essential and non-essential states.
Define the communicating classes, and divide them intorecurrent and transient communicating classes.
Decide whether the classes are periodic.
Slide 87
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
89/130
Exercises
Analyse the DTMC with P=
0 0.5 0.51 0 0
1 0 0
.
Consider a DTMC with P=
0 0 0.5 0.51 0 0 00 1 0 00 1 0 0
.
Slide 88
Discrete-Time Markov Chains
-
8/9/2019 Stochastic modelling Notes Discrete
90/130
ExampleAnalyse the Markov chain with states numbered 1 to 5 and withone-step transition probability matrix
P=
1 0 0 0 0
1/2 0 1/2 0 00 1/2 1/2 0 00 0 0 0 10 0 0 1 0
Slide 89
Finite State DTMCs have at least one recurrent state
-
8/9/2019 Stochastic modelling Notes Discrete
91/130
Recall that a state j is transient if and only ifn=1
p(n)jj =
n=1
E[I(Xn =j)|X0 =j]< .
This means that the DTMC visits jonly finitely-many times (withprobability one), given that it starts there.Let Sbe the set of states, and fj,kbe the probability that theDTMC ever visits state k, given that it starts in state j.
Slide 90
Finite State DTMCs have at least one recurrent stateIf all states k Sare transient, then it must be the case that
( )
( )
-
8/9/2019 Stochastic modelling Notes Discrete
92/130
n=1
p(n)jj +
k=j
fj,kn=1
p(n)kk 1/2 hasall statestransient.
Slide 92
Recurrence in Infinite State DTMC
-
8/9/2019 Stochastic modelling Notes Discrete
94/130
In order to be able to tell whether a class is recurrent, we need tobe able to calculate the probability of return for at least one state.
Lets label this state 0 and denote by fj,0 the probability that theDTMCever reaches state 0, given that it starts in state j. Then
we see that the sequence{fj,0} satisfies the equationfj,0 =pj0+
k=0
pjkfk,0. ()
Slide 93
Solving the equation ()We illustrate how to solve this equation by
-
8/9/2019 Stochastic modelling Notes Discrete
95/130
Example: Consider a random walk on the nonnegative integers:
pj,j+1 =p= 1 pj,j1, for j>0,
and
p0,1 =p= 1 p0,0.(And pij= 0 otherwise.) Equation ()says that for j>1,
fj,0 =pfj+1,0+ (1 p)fj1,0
and, for j= 0, 1,fj,0 =pfj+1,0+ (1 p).
Slide 94
Solving the equation ()The first equation is a second-order linear difference equation withconstant coefficients
-
8/9/2019 Stochastic modelling Notes Discrete
96/130
constant coefficients.
These can be solved in a similar way to second-order lineardifferential equations with constant coefficients, which you learnedabout in Calculus II or accelerated Mathematics II.Recall that, to solve
a d2ydt2
+bdydt
+cy= 0,
we try a solution of the form y=y(t) =et to obtain theCharacteristic Equation
a2 +b +c= 0.
Slide 95
Solving the equation ()
-
8/9/2019 Stochastic modelling Notes Discrete
97/130
If the characteristic equation has distinct roots, 1 and 2, thegeneral solution has the form
y=Ae1t +Be2t.
If the roots are coincident, the general solution has the form
y=Ae1t +Bte1t.
In both cases, the values of the constants A and Bare determinedby the initial conditions.
Slide 96
Solving the equation ()
-
8/9/2019 Stochastic modelling Notes Discrete
98/130
The method for solving second-order linear difference equationwith constant coefficients is similar. To solve
auj+1+buj+cuj1 = 0,
we try a solution of the form u=mj to obtain theCharacteristicEquation
am2 +bm+c= 0.
Slide 97
Solving the equation ()
-
8/9/2019 Stochastic modelling Notes Discrete
99/130
If this equation has distinct roots, m1 and m2, the general solutionhas the form
y=Amj1+Bmj2.
If the roots are coincident, the general solution has the form
y=Amj1+Bjmj1.
The values of the constants A and Bneed to be determined byboundary equations, or other information that we have.
Slide 98
-
8/9/2019 Stochastic modelling Notes Discrete
100/130
Solving the equation ()If(1 p)/p>1, then the general solution is
-
8/9/2019 Stochastic modelling Notes Discrete
101/130
fj,0 =A+B
1 pp
j.
Similarly, if(1 p)/p= 1, the general solution is of the form
fj,0 =A+Bj.
In either case, these can only be probabilities ifB= 0 and thennotice
A=f1,0 =pf2,0+ (1 p) =pA+ (1 p),so A= 1. This makes sense because p 1/2 and so we have aneutral or downward drift.
Slide 100
Solving the equation ()
-
8/9/2019 Stochastic modelling Notes Discrete
102/130
However if (1 p)/p
-
8/9/2019 Stochastic modelling Notes Discrete
103/130
gj,0 =pj0+ k=0
pjkgk,0.
We show by induction that fj,0(m) gj,0 for all m. Clearly this istrue for m= 1. Assume that it is true for m=. Then
fj,0( + 1) = pj0+k=0
pjkfk,0()
pj0+k=0
pjkgk,0
= gj,0.
It follows that fj,0 = limmfj,0(m) gj,0 and so{fj,0} is theminimal nonnegative solutionto ().
Slide 102
Solving the equation ()
For the random walk with (1 p)/p
-
8/9/2019 Stochastic modelling Notes Discrete
104/130
( p)/p g
j 1 was of the form
fj,0 =A+B
1 p
p
j.
The minimal nonnegative solution for j>0 is
fj,0 =
1 p
p
j.
and f0,0 = 2(1 p).
Slide 103
The Gamblers Ruin Problem
Denote the initial capital of a gambler by N.
Th bl ill l i if h / h i $M l
-
8/9/2019 Stochastic modelling Notes Discrete
105/130
The gambler will stop playing if he/she wins $Mor loseshis/her initial stake of $N.
There is a probability pthat the gambler wins $1 and aprobability 1 pthat he/she loses $1 on each game.
We assume that the outcomes of successive plays are
independent.
This is a simple DTMC with a finite state space{N, . . . , M}and transition probabilities pj,j+1 =pand pj,j1 = 1 p forj {N+ 1, . . . , M 1}, and pN,N=pM,M= 1.
The gambler would like to know the probability that he/she willwin $Mbefore becoming bankrupt.
Slide 104
-
8/9/2019 Stochastic modelling Notes Discrete
106/130
The Gamblers Ruin ProblemThe upper boundary condition gives us
A = B1 p M
-
8/9/2019 Stochastic modelling Notes Discrete
107/130
A=
Bp ,
and the lower boundary condition gives us
B= 1 p
pN
1
p
pM
1
,
So the general solution is
fj,N=1p
pj
1p
pM
1pp
N 1pp
M.
Slide 106
The Gamblers Ruin ProblemWhen p= 1/2, the general solution to the first equation is
fj,N=A+Bj.
-
8/9/2019 Stochastic modelling Notes Discrete
108/130
The upper boundary condition gives us
A= BM,
and the lower boundary condition gives us
B= 1M+N
,
So the general solution is
fj,N= MjM+N
.
Slide 107
The Gamblers Ruin Problem
-
8/9/2019 Stochastic modelling Notes Discrete
109/130
The expected gain is E(G) =M (N+M)f0,NHere are some numbers:
IfN= 90, M= 10 and p= 0.5 then f0,N= 0.1.
IfN= 90, M= 10 and p= 0.45 then f0,N= 0.866. IfN= 90, M= 10 and p= 0.4 then f0,N= 0.983.
IfN= 99, M= 1 and p= 0.4 then f0,N= 0.333.
Slide 108
Long run behaviour of DTMCs
-
8/9/2019 Stochastic modelling Notes Discrete
110/130
We want to know the proportion of time a DTMC spends in eachstate over the long run (if this concept makes sense) which should
be the same as the limiting probabilitieslimnp(n)kj .
These will be zero for transient states and non-essential states.
For an irreducible and recurrent DTMC, we will see that theselimiting probabilities exist and are even independent ofk.
Slide 109
Long run behaviour of DTMCs
Recall that we used Ti(j) to denote the time between the ith and
-
8/9/2019 Stochastic modelling Notes Discrete
111/130
(i 1)st return to state j. We then defined state j(and hence itscommunicating class) to be
transientifTi(j) = with positive probability, and recurrentifTi(j)< with probability one.
There is a further classification of recurrent states. Specifically, j is null-recurrentifE[Ti(j)] = , and positive-recurrentifE[Ti(j)]< .
This classification is important for the calculation of the limiting
probabilities.
Slide 110
Long run behaviour of DTMCs
-
8/9/2019 Stochastic modelling Notes Discrete
112/130
Examples The symmetric random walk with p=q= 1/2. For all j,
Ti(j)< with probability one, but E[Ti(j)] = . That is,all states are null-recurrent.
A finite irreducible DTMC: E[Ti(j)]< for all j.
Slide 111
Long run behaviour of DTMCs
In the long run, how often does a DTMC visit a state j?Let j E[T1(j)|X0 =j]< By the Law of Large Numbers,T1(j) +T2(j) + +Tk(j) jk. So there are approximately k
-
8/9/2019 Stochastic modelling Notes Discrete
113/130
( ) ( )
( )
j
visits in kjtime-steps, and the relative frequency of visits to j is1/j. This leads us to
Theorem: If j is an aperiodic state in apositive recurrentcommunicating class: j=E[T1(j)
|X0 =j]1, which is
t ith fi it t d ti th f
-
8/9/2019 Stochastic modelling Notes Discrete
127/130
recurrent with a finite expected recurrence time, then for0 k d 1,{Xnd+k|X0 Sr} is an ergodic DTMC withstate space S
(k)r .
For any and k= 0, 1, , d 1,
P(Xnd+k=j|X0 S()r )
(r)j as n for j S(+k(mod d))r= 0 for j S(+k(mod d))r
so jS()r (r)j = 1 for any
Slide 126
Discrete-Time Markov Chains
ExampleClassify the DTMC with
-
8/9/2019 Stochastic modelling Notes Discrete
128/130
P=
0 0 0.1 0.9 0 0 01 0 0 0 0 0 00 1 0 0 0 0 0
0 1 0 0 0 0 00 0 0 0 1323 0
0 0 0 0 1545 0
14
14 0 0
18
14
18
and discuss its properties.
Slide 127
Good Trick
Sometimes we want to model a physical system where the futuredepend on part of the past. Consider following example. A
seq ence of random ariables {Xn} describes the eather at a
-
8/9/2019 Stochastic modelling Notes Discrete
129/130
sequence of random variables{Xn} describes the weather at aparticular location, with Xn = 1 if it is sunny and Xn = 2 if it israiny on day n.Suppose that the weather on day n+ 1 depends on the weatherconditions on days n
1 and n as is shown below:
P(Xn+1 = 2|Xn =Xn1 = 2) = 0.6P(Xn+1 = 1|Xn =Xn1 = 1) = 0.8P(Xn+1 = 2|Xn = 2, Xn1 = 1) = 0.5
P(Xn+1 = 1|Xn = 1, Xn1 = 2) = 0.75
Slide 128
Good Trick
If we put Yn (Xn1 Xn) then Yn is a DTMC The possible ( ) ( ) ( ) ( )
-
8/9/2019 Stochastic modelling Notes Discrete
130/130
If we put Yn = (Xn1, Xn), then Yn is a DTMC. The possiblestates are 1 = (1, 1), 2 = (1, 2), 3 = (2, 1) and 4 = (2, 2).We see that{Yn : n 1} is a DTMC with transition matrix
P=
0.8 0.2 0 0
0 0 0.5 0.50.75 0.25 0 00 0 0.4 0.6
.
Slide 129