short course: markov chains & mixing times · lecture 1: yuval peres, microsoft research short...

25
Introduction Short Course: Markov chains & mixing times Lecture 1: Yuval Peres, Microsoft Research Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chain

Upload: others

Post on 25-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Short Course: Markov chains & mixing times

Lecture 1: Yuval Peres, Microsoft Research

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 2: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Definition of Markov chains

• A process (Xt)t=0,1,... is a (time-homogeneous) Markov chain if

P(Xk+1 = y | X0 = x0, . . . ,Xk−1 = xk−1,Xk = x) = p(x, y) ,

where p(x, y) ≥ 0 and∑

y p(x, y) = 1 for all x.

• The transition kernel P = (p(x, y))x,y.

• The stationary measure π satisfies that πP = π.

• Aperiodic: gcdt ≥ 1 : Pt(x, x) > 0 = 1 for all x.

• Irreducible: for all x, y, there exists t ∈ N such that Pt(x, y) > 0.

• Reversible: π(x)p(x, y) = π(y)p(y, x) for all x, y.

Most examples in this talk are aperiodic, irreducible and reversible.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 3: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Classical theory vs. modern focus

• An important feature of aperiodic and irreducible (finite) Markovchains: the distribution at time t converges to the (unique)stationary distribution as t → ∞.

• Classical theory: Fix the chain and study the rate of convergenceof the distribution at time t to stationarity, as t → ∞.

• Modern focus: Fix the target distance to stationarity and studythe asymptotics of the required time to reach that target, for afamily of chains as the size goes to infinity. (Aldous, Diaconis,. . . )

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 4: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Motivations

• Statistical physics.• Monte Carlo simulation.• Models of dynamical processes.• Deep connections between the convergence rate and the spatial

properties of the physical model.• Biology.

• Models of DNA evolution.• A much simplified chain of Durrett: Given a permutation, take a

random segment of fixed length and reverse it.• Computer science.

• Sampling.• Approximate counting of combinatorial structures.

• Card shuffling.

. . .

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 5: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

More on sampling

• Random permutation: an example easy to sample directly.

• Knuth Algorithm: For i = 1, . . . , n − 1, select Ui out of i, . . . , nuniformly at random and swap the two elements at positions i andUi.

• Many other models are hard to sample directly.• Ising model : more in later lectures.• Coloring model: a uniform measure on all the proper colorings of

a graph with q colors, where in a proper coloring, any twoadjacent vertices are assigned different colors.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 6: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

An illustration of coloring model

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 7: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Total variation distance

For two distributions µ and ν on a (discrete) space Ω, the totalvariation distance ‖µ − ν‖TV is defined in the following equivalentways:

• ‖µ − ν‖TV := 12∑

x∈Ω |µ(x) − ν(x)|.

• ‖µ − ν‖TV := supA⊂Ω(µ(A) − ν(A)).

• For all couplings (X,Y) where X ∼ µ and Y ∼ ν, we haveP(X , Y) ≥ ‖µ − ν‖TV. Furthermore, there exists a coupling suchthat equality holds.

An important feature: advancing a Markov chain can only decreasethe total variation distance. That is,

‖µP − νP‖TV ≤ ‖µ − ν‖TV .

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 8: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Mixing time and its first property

• We are interested in the decay of d(t) := maxx∈Ω ‖Pt(x, ·) − π‖TV.

• ‖µP − νP‖ ≤ ‖µ − ν‖ ⇒ d(t) is decreasing in t.

• The mixing time tMIX(ε) := mint ≥ 0 : d(t) ≤ ε, for someε ∈ (0, 1). Furthermore, tMIX := tMIX(1/4) by convention.

• Since d(t + s) ≤ 2d(t)d(s), we have tMIX(2−k) ≤ k · tMIX.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 9: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Mixing time v.s. relaxation time

• The transition kernel P always has 1 as the largest eigenvalue.

• Define λ? := max|λ| : λ is an eigenvalue of P and the spectralgap 1 − λ?. Furthermore, define the relaxation timetREL := 1/(1 − λ?).

• An important relation between relaxation time and mixing time:

(tREL − 1) log(1/2ε) ≤ tMIX(ε) ≤ log(1/επmin)tREL ,

where πmin = minx∈Ω π(x).

• Seemingly, tREL and tMIX are roughly the same. However, thepossible difference of tREL and tMIX can reveal deep features of aMarkov chain.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 10: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Random walks on cycles and hypercubes

A chain is called lazy if P(x, x) ≥ 1/2 for all x. All the eigenvalues ofa lazy chain are non-negative.

• Lazy random walk on a cycle of length n.• The second eigenvalue λ2 = (1 + cos(2π/n))/2 and the

corresponding eigenfunction is f2(k) = cos(4πk/n).• The relaxation time tREL n2.• The mixing time tMIX n2.

• Lazy random walk on hypercube −1, 1n.• The second eigenvalue λ2 = 1 − 1

n and one of the correspondingeigenfunctions is f2 =

∑ni=1 xi, where xi ∈ −1, 1 denotes the i-th

coordinate.• The relaxation time tREL = n.• The mixing time tMIX n log n.

• In the cycle, the relaxation time and the mixing time have thesame order, while in the hypercube the mixing time is of largermagnitude.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 11: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Couplings of Markov Chains

Let P be a transition matrix for a Markov chain. A coupling of aP-Markov-chain started at x and a P-Markov-chain started at y is asequence (Xn,Yn)∞n=0 such that

• all variables Xn and Yn are defined on the same probability space,• Xn is a P-Markov-chain started at x, and• Yn is a P-Markov-chain started at y.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 12: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Example: The lazy random walk on the n-cycle.

• This chain remains at its current position with probability 1/2,and moves to each of the two adjacent site with probability 1/4.

• Can couple the chains started from x and y as follows:• Flip a fair coin to decide if the X-chain moves or the Y-chain

moves,• Move the selected chain to one of its two neighboring sites,

chosen with equal probability.

• Both the x-particle and the y-particle are performing lazy simplerandom walks on the n-cycle.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 13: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Mixing and Coupling

• Let (Xt,Yt)∞t=0 be a coupling of a P-chain started from x and aP-chain started at y.

• Letτ := mint ≥ 0 : Xt = Yt .

If the coupling is Markovian (and we will only consider those), itcan always be redefined so that

Xt = Yt for t ≥ τ,

So, let us assume this.4

t

Yt

0

1

2

3

x

y

X

• The pair (Xt,Yt) (for given t) is a coupling of Pt(x, ·) and Pt(y, ·).

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 14: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Mixing and Coupling

• Since Xt has distribution Pt(x, ·) and Yt has distribution Pt(y, ·),using the coupling characterization of total variation distance,

P(τ > t) = P(Xt , Yt) ≥ dTV(Pt(x, ·),Pt(y, ·)) .

• Combined with the inequality

dTV(Pt(x, ·), π) ≤ maxy∈Ω

dTV(Pt(x, ·),Pt(y, ·)) ,

if there is a coupling (Xt,Yt) for every pair of initial states (x, y),then this shows that

d(t) = maxx∈Ω

dTV(Pt(x, ·), π) ≤ maxx,y

dTV(Pt(x, ·),Pt(y, ·))

≤ maxx,y

Px,y(τ > t) .

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 15: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Mixing for lazy random walk on the n-cycle

• Use the coupling which selects at each move one of the“particles” at random; the chosen particle is equally likely tomove clockwise as counter-clockwise.

• The clockwise difference between the particles, Dt, is a simplerandom walk on 0, 1, . . . , n.

• When Dt ∈ 0, n, the two particles have collided.• If τ is the time until a simple random walk on 0, 1, . . . , n hits an

endpoint when started at k, then

Ekτ = k(n − k) ≤n2

4.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 16: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

RW on n-cycle, continued

• By Markov’s inequality,

P(τ > t) ≤Eτt≤

n2

4t.

• Using the coupling inequality,

d(t) ≤ maxx,y

P(τ > t) ≤n2

4t.

• Taking t ≥ n2 yields d(t) ≤ 1/4, whence

tMIX ≤ n2 .

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 17: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Random Walk on d-dimensional Torus

• Ω = (Z/nZ)d. The walk remains at current position withprobability 1/2.

• Couple two particles as follows:• Select among the d coordinates at random.• If the particles agree in the selected coordinate, move the walks

together in this coordinate. Thus both walks together either makea clockwise move, a counterclockwise move, or remain put.

• If the particles disagree in the chosen coordinate, flip a coin todecide which walker will move. Move the selected walk eitherclockwise or counterclockwise, each with probability 1/2.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 18: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

(2,4,8,7,4,2)

(1,4,4,7,5,8)

(1,5,4,7,5,9) (2,5,8,7,4,2)

yx

(1,4,4,7,5,9)

(2,4,8,7,4,2)

x

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 19: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

• Consider the clockwise difference between the i-th coordinate ofthe two particles. It moves at rate 1/d, and when it does move, itperforms simple random walk on 0, 1, . . . , n, with absorption at0 and n. Thus the expected time to couple the i-th coordinate isbounded above by dn2/4.

• Since there are d coordinates, the expected time for all of them tocouple is not more than

d × dn2

4=

d2n2

4.

• By the coupling theorem,

tMIX ≤ d2n2 .

Exercise: Improve the d-dependence in this bound.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 20: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

RW on hypercubecoin tossed to decide replacement bit

0011010011

same coordinate selected for updating

0110001010 01100110100011010011

1

• Consider the lazy random walk on the hypercube 0, 1n. Sitesare neighbors if they differ in exactly one coordinate.

• To update the two walks, first pick a coordinate at random. Thesame coordinate is used for both walks.

• Toss a coin to determine if the bit at the chosen coordinate isreplaced by a 1 or a 0. The same bit is used for both walks.

• No matter the initial positions of the two walks, when everycoordinate has been selected, the two walks agree.

• Reduces to a "coupon collector’s" problem: how many timesmust a coordinate be drawn at random before every coordinate ischosen?

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 21: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Coupon collector

• Let Ak(t) be the event that the k-th coupon has not been collectedby time t.

• Observe

P(Ak(t)) =

(1 −

1n

)t

≤ e−t/n .

• Consequently,

P( n⋃

k=1

Ak(t))≤

n∑k=1

e−t/n = ne−t/n .

• In other words, if τ is the time until all coupons have beencollected,

P(τ > n log n + cn) = P

n⋃k=1

Ak(n log n + cn)

≤ e−c .

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 22: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Returning to the hypercube,

d(n log n + cn) ≤ P(τ > n log n + cn) ≤ e−c,

whencetMIX(ε) ≤ n log n + n log(1/ε) .

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 23: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Strong stationary times

• Random mapping representation of Markov chain P: an i.i.d.sequence (Zt) and a map f such that Xt = f (Xt−1,Zt) with X0 = x.

• The sequence (Zt) has more information than the chain (Xt).• A randomized stopping time is a stopping time for the sequence

(Zt). Not necessarily a stopping time of (Xt)!• A strong stationary time is a randomized stopping time τ such

that Xτ has distribution π and is independent of τ. That is,

Px(τ = t,Xτ = y) = Px(τ = t)π(y) .

Proposition

If τ is a strong stationary time, then

d(t) = maxx‖Pt(x, ·) − π‖TV ≤ max

xPx(τ > t) .

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 24: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Top-to-random shuffle

• Top-to-random shuffle: take the top card and insert it uniformlyat random in the deck.

• Strong stationary time τtop: the time one move after the the firstoccasion when the original bottom card has moved to the top ofthe deck.

• τtop is the same as the coupon collector’s time and henceP(τtop > n log n + cn) ≤ e−c.

• tMIX(ε) ≤ n log n + log(ε−1)n.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times

Page 25: Short Course: Markov chains & mixing times · Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times. Introduction An illustration of coloring model

Introduction

Random walk on the hypercube - revisited

• Lazy walk on a hypercube can be views as a dynamics on−1, 1n, where at each step a coordinate is selected and updateduniformly at random.

• The strong stationary time τrefresh: the first time when all thecoordinates have been selected at least once for updating.

• τrefresh is the same as the coupon collector’s time.• tMIX(ε) ≤ n log n + log(ε−1)n.• Is this tight?

Answer: In fact tMIX = 12 n log n + O(n) and there is a cutoff.

Lecture 1: Yuval Peres, Microsoft Research Short Course: Markov chains & mixing times