markov chains introduction simulation - news · markov chains introduction simulation ... is called...
TRANSCRIPT
Markov Chains
Introduction
Simulation
Modelling Cloud Cover Data
1
Finite State Markov Chains
Definition: State Space = S = {1,2, . . . ,m}.
Definition: The sequence of random variables X1, X2, X3, . . . ,
is called a Markov chain if the Markov property holds:
P (Xn = xn|Xn−1 = xn−1, Xn−2 = xn−2, . . .) =
P (Xn = xn|Xn−1 = xn−1)
where xn, xn−1, xn−2, . . . are elements of S.
2
Example
B1, B2, . . . are independent Bernoulli random variables with pa-
rameter p. Xn =∑nk=1Bk(mod 2) + 1, for n = 1,2, . . . is a
Markov chain with state space S = {1,2}.
P (Xn = 1|Xn−1 = 1) = 1− p.
P (Xn = 2|Xn−1 = 1) = p.
P (Xn = 1|Xn−1 = 2) = p.
P (Xn = 2|Xn−1 = 2) = 1− p.
3
Transition Matrix
Define a matrix P with (i, j)th entry
pij = P (Xn = j|Xn−1 = i)
pij is called the transition probability from state i to state j. P
is called a transition matrix.
All rows of P sum to one. That is,
P
1...1
=
1...1
4
Example (cont’d)
P =
1-p pp 1-p
P
11
=
11
5
Another Example
Set
P =
0 0.5 0.50 0.5 0.51 0 0
.
6
Example
We can study this example with R. Set up the 3× 3 matrix P:
> P <- matrix(c(0, 0, 1, .5, .5, 0, .5, .5, 0), nrow=3)
> P
[,1] [,2] [,3]
[1,] 0 0.5 0.5
[2,] 0 0.5 0.5
[3,] 1 0.0 0.0
7
Example
Add up the rows:
> P%*%rep(1,3)
[,1]
[1,] 1
[2,] 1
[3,] 1
8
Simulating a Markov Chain
We want to simulate n values of a Markov chain having transi-
tion matrix P, starting at x1.
Function name: MC.sim
Input: n, P, x1
Output: a vector of length n
9
Reproducing Our Output
> options(width=55)
> set.seed(12867)
10
Example
Generate 20 values from the 3 × 3 transition matrix P with
starting value 3:
> MC.sim(20, P, 3)
[1] 3 1 2 2 3 1 3 1 3 1 3 1 2 3 1 2 2 3 1 3
11
Another Example
P =
0.7 0.30.4 0.6
> P.mat <- matrix(c(0.7,0.3,0.4,0.6),ncol=2,byrow=TRUE)
> MC.eg <- MC.sim(100, P.mat)
12
Output
> MC.eg
[1] 2 2 2 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 1 1 1 2 1 1 1
[26] 1 1 2 2 2 2 2 1 1 1 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2
[51] 1 1 1 1 2 2 1 1 1 2 1 2 2 1 1 2 1 1 2 2 1 1 2 1 1
[76] 1 1 1 2 1 2 2 2 1 1 1 2 2 1 1 1 1 1 1 2 2 2 2 2 1
13
A Markov Chain Simulator
> MC.sim <- function(n,P,x1) {
+ sim <- as.numeric(n)
+ m <- ncol(P)
+ if (missing(x1)) {
+ sim[1] <- sample(1:m,1) # random start
+ } else { sim[1] <- x1 }
+ for (i in 2:n) {
+ newstate <- sample(1:m,1,prob=P[sim[i-1],])
+ sim[i] <- newstate
+ }
+ sim
+ }14
Understanding the Code
Use of the sample() function:
> m <- 3
> sample(1:m,1,prob=c(.2, .7, .1))
[1] 2
The above simulates a random variable with distribution
p(1) = .2, p(2) = .7, p(3) = .1.
15
Understanding the Code
If the current value of the Markov Chain, having transition ma-
trix P, is j, then the next value will be
> j <- 2
> sample(1:m,1,prob=P[j,])
[1] 2
The above simulates a random variable with distribution
P (j,1), P (j,2), P (j,3). (j is 2 here.)
16
Analyzing Cloud Cover Data
> MC2 <- function(x) {
+ # Fit a 2 state MC to data in vector x (S = {1, 2})
+ n <- length(x)
+ N1 <- sum(x[-n]==1)
+ N2 <- sum(x[-n]==2)
+ N11 <- sum(x[-n]==1 & x[-1]==1)
+ N12 <- sum(x[-n]==1 & x[-1]==2)
+ N21 <- sum(x[-n]==2 & x[-1]==1)
+ N22 <- sum(x[-n]==2 & x[-1]==2)
+ P <- matrix(c(N11/N1, N21/N2, N12/N1, N22/N2), nrow=2)
+ return(P)
+ }17
Analyzing Cloud Cover Data
> source("cloud70.R")
> # this vector contains hourly records of cloud data
> # from Winnipeg, Canada for June 1, 1970 through
> # September 30, 1970. "1" is clear; "2" is cloudy
> # (cloudy means the measured value exceeds 2)
> cloud70[1:100] # observed data (first 100 hours)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
[26] 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[51] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[76] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 1 2 2
18
Analyzing Cloud Cover Data
> P <- MC2(cloud70)
> P # estimated transition matrix for hourly cloud cover
[,1] [,2]
[1,] 0.8211998 0.1788002
[2,] 0.2537190 0.7462810
19
Analyzing Cloud Cover Data
Simulate data from the transition matrix in order to see if the
Markov chain model is realistic:
> cloud70.sim <- MC.sim(length(cloud70), P)
> cloud70.sim[1:100]
[1] 1 1 1 1 1 2 1 1 2 2 2 2 2 1 1 2 2 2 1 1 1 1 1 1 2
[26] 2 2 2 1 1 1 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 2 2 2 2
[51] 2 2 1 1 2 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2
[76] 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2
20
Checking the Markov Chain model
Comparing runlengths in simulated and real data:
> MC2.chk <- function(x) {+ P <- MC2(x)+ x.sim <- MC.sim(length(x), P)+ x.runs <- rle(x)+ x.len <- length(x.runs[[1]])+ x.1 <- x.runs[[1]][seq(1,x.len,2)]+ x.2 <- x.runs[[1]][seq(2,x.len,2)]+ x.sim.runs <- rle(x.sim)+ x.sim.len <- length(x.sim.runs[[1]])+ x.sim.1 <- x.sim.runs[[1]][seq(1,x.sim.len,2)]+ x.sim.2 <- x.sim.runs[[1]][seq(2,x.sim.len,2)]+ par(mfrow=c(1,2))+ qqplot(x.1, x.sim.1, xlab="observed state 1 runlengths",+ ylab="simulated state 1 runlengths")+ abline(0,1)+ qqplot(x.2, x.sim.2, xlab="observed state 2 runlengths",+ ylab="simulated state 2 runlengths")+ abline(0,1)+ }
21
Checking the Markov Chain model
According to the following plots, the Winnipeg stays sunnier
longer than predicted by the Markov chain, but the cloudy pe-
riods are predicted well.
> MC2.chk(cloud70)
22
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●
●●●●●●●●●
●●
●
●
●
●
0 20 40 60
510
1520
observed state 1 runlengths
sim
ulat
ed s
tate
1 r
unle
ngth
s
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●
●●●●
●●●●
●●●
●
●
●
5 10 15 20
05
1015
2025
observed state 2 runlengths
sim
ulat
ed s
tate
2 r
unle
ngth
s