mixing in product spaces - mit mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing...

Mixing in Product Spaces

Elchanan Mossel

Elchanan Mossel Mixing in Product Spaces

Page 2: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Poincare Recurrence Theorem

Theorem (Poincare, 1890)

Let f : X → X be a measure preserving transformation. LetE ⊂ X measurable. Then

P[x ∈ E : f n(x) /∈ E , n > N(x)] = 0

One of the first results in Ergodic Theory.

Long term mixing.

This talk is about short term mixing.

Page 3: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Poincare Recurrence Theorem

Theorem (Poincare, 1890)

Let f : X → X be a measure preserving transformation. LetE ⊂ X measurable. Then

P[x ∈ E : f n(x) /∈ E , n > N(x)] = 0

One of the first results in Ergodic Theory.

Long term mixing.

This talk is about short term mixing.

Page 4: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Finite Markov Chains

As a first example consider a Finite Markov chain.

Let M be a k × k doubly stochastic symmetric matrix.

Pick X 0 uniformly at random from 1, . . . , k.

Given X i = a, let X i+1 = b with probability Ma,b.

Theorem (Long Term Mixing for Markov Chains)

Suppose that other than 1, all eigenvalues λi of M satisfy|λi | ≤ λ < 1. Then for any two sets A,B ⊂ [k], it holds that∣∣∣P[X 0 ∈ A,X t ∈ B]− P[A]P[B]

∣∣∣ ≤ λt

Page 5: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Finite Markov Chains

As a first example consider a Finite Markov chain.

Let M be a k × k doubly stochastic symmetric matrix.

Pick X 0 uniformly at random from 1, . . . , k.

Given X i = a, let X i+1 = b with probability Ma,b.

Theorem (Long Term Mixing for Markov Chains)

Suppose that other than 1, all eigenvalues λi of M satisfy|λi | ≤ λ < 1. Then for any two sets A,B ⊂ [k], it holds that∣∣∣P[X 0 ∈ A,X t ∈ B]− P[A]P[B]

∣∣∣ ≤ λt

Page 6: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Short Term Mixing for Markov Chains

Theorem∣∣∣P[X 0 ∈ A,X 1 ∈ B]− P[A]P[B]∣∣∣ is upper bounded by

λ√

P[A](1− P[A])P[B](1− P[B])

Shows: mixing in one step for large sets.

Proof: 1A = P[A]1 + f , 1B = P[B]1 + g , where f , g ⊥ 1

P[X 0 ∈ A,X 1 ∈ B] =1k

(P[A]1 + f )tM(P[B]1 + g)

= P[A]P[B] +1kf tMg ,

1k|f tMg | ≤ λ‖f ‖2‖g‖2 = λ

√P[A](1− P[A])P[B](1− P[B])

Also called Expander Mixing Lemma.Used a lot in computer science, e.g. in (de)randomization.

Page 7: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

λ√

P[A](1− P[A])P[B](1− P[B])

Shows: mixing in one step for large sets.Proof: 1A = P[A]1 + f , 1B = P[B]1 + g , where f , g ⊥ 1

P[X 0 ∈ A,X 1 ∈ B] =1k

(P[A]1 + f )tM(P[B]1 + g)

1k|f tMg | ≤ λ‖f ‖2‖g‖2 = λ

√P[A](1− P[A])P[B](1− P[B])

Page 8: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

λ√

P[A](1− P[A])P[B](1− P[B])

P[X 0 ∈ A,X 1 ∈ B] =1k

(P[A]1 + f )tM(P[B]1 + g)

1k|f tMg | ≤ λ‖f ‖2‖g‖2 = λ

√P[A](1− P[A])P[B](1− P[B])

Page 9: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

λ√

P[A](1− P[A])P[B](1− P[B])

P[X 0 ∈ A,X 1 ∈ B] =1k

(P[A]1 + f )tM(P[B]1 + g)

1k|f tMg | ≤ λ‖f ‖2‖g‖2 = λ

√P[A](1− P[A])P[B](1− P[B])

Page 10: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

The tensor property

Consider (Y1,Z1), . . . , (Yn,Zn) which are drawnindependently from the distribution of (X 0,X 1).

Equivalently, the transition matrix from Y = (Y1, . . . ,Yn) toZ = (Z1, . . . ,Zn) is M⊗n.

Thm =⇒ that for any sets A,B ⊂ [k]n:∣∣∣P[Y ∈ A,Z ∈ B]−P[A]P[B]∣∣∣ ≤ λ√P[A](1− P[A])P[B](1− P[B])

Follows immediately from tensorization of the spectrum.

Page 11: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

The tensor property

Page 12: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

The tensor property

Page 13: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Log Sobolev inequalities

Entropy, Log Sobolev and hyper-contraction

A similar story could be told using more sophisticated analytictools. Easier to work with Markov semi-groups Tt = e−tL.

Entropy, Dirchelet Form

Ent(f ) = E(f log f )− Ef · logEf

E(f , g) = E(fLg) = E(gLf ) = E(g , f ) = − ddtEfTtg

∣∣∣t=0

.

Definition of Log-Sob

p-logSob(C) ⇐⇒ ∀f ,Ent(f p) ≤ Cp2

4(p−1)E(f p−1, f ) (p 6= 0, 1)

1-logSob(C) ⇐⇒ ∀f ,Ent(f ) ≤ C4 E(f , log f )

0-logSob(C) ⇐⇒ ∀f ,Var(log f ) ≤ −C2 E(f , 1/f )

Page 14: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

∣∣∣t=0

.

4(p−1)E(f p−1, f ) (p 6= 0, 1)

Page 15: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

∣∣∣t=0

.

4(p−1)E(f p−1, f ) (p 6= 0, 1)

Page 16: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Log Sob. Inequalities and Hyper-Contraction

Hyper-Contraction (Gross, Nelson 1960 ... )

r -logSob with constant C implies

‖Tt f ‖p ≤ ‖f ‖q, t ≥ C

4log

p − 1q − 1

, 1 < p < q < r or r ′ < q < p

=⇒ |E[g(X0)f (Xt)]| = |E [gTt f | ≤ ‖g‖p′‖Tf ‖p ≤ ‖g‖p′‖f ‖qIf f = 1A and g = 1B , get:

P[X0 ∈ A,Xt ∈ B] ≤ ‖1A‖q‖1B‖p′ = P[A]1/qP[B]1/p′,

Now optimize over norms to get a better bound than CS.

Page 17: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Reverse-Hyper-Contraction

Log-Sobolev and Rev. Hyper-Contraction(M-Oleszkiewicz-Sen-13)

Let Tt = e−tL be a general Markov semi-group satisfying

2-Logsob with constant C or

1-Logsob inequality with constant C .

Then for all q < p < 1, all positive f , g and all t ≥ C4 log 1−q1−p it

holds that

‖Tt f ‖q ≥ ‖f ‖p =⇒

E[g(X0)f (Xt)] = E [gTt f ] ≥ ‖g‖q′‖f ‖p

Page 18: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Short-Time Implications

Theorem (M-Oleszkiewicz-Sen-13 ; Short-Time Implications)

Let Tt = e−tL, where L satisfy 1 or 2-LogSob inequality withconstant C . Let A,B ⊂ Ωn with P[A] ≥ ε and P[B] ≥ ε. Then:

P[X (0) ∈ A,X (t) ∈ B] ≥ ε2

1−e−2t/C

Comments

1. Works for small sets too.2. Tensorizes.3. Some examples where it is (almost) tight.4. Uses in social choice analysis, queuing theory.

Page 19: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Short-Time Implications

Theorem (M-Oleszkiewicz-Sen-13 ; Short-Time Implications)

Let Tt = e−tL, where L satisfy 1 or 2-LogSob inequality withconstant C . Let A,B ⊂ Ωn with P[A] ≥ ε and P[B] ≥ ε. Then:

P[X (0) ∈ A,X (t) ∈ B] ≥ ε2

1−e−2t/C

Comments

1. Works for small sets too.2. Tensorizes.3. Some examples where it is (almost) tight.4. Uses in social choice analysis, queuing theory.

Page 20: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Comment: typical application MCMC

Long Time Behavior

Log Sobolev inequalities play a major role in analyzing long termmixing of Markov chains, in particular in analysis of mixing times(Diaconis, Saloff-Coste etc.)

Long Time Behavior

The ε-total variation mixing time of a finite Markov chain isbounded by:

1λ

(log(1/π∗) + log(1/ε))

1C

(log log(1/π∗) + log(1/ε))

for a continuous time Markov chain with spectral gap λ and2-LogSob C .

Page 21: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Comment: typical application MCMC

Long Time Behavior

Log Sobolev inequalities play a major role in analyzing long termmixing of Markov chains, in particular in analysis of mixing times(Diaconis, Saloff-Coste etc.)

Long Time Behavior

The ε-total variation mixing time of a finite Markov chain isbounded by:

1λ

(log(1/π∗) + log(1/ε))

1C

(log log(1/π∗) + log(1/ε))

for a continuous time Markov chain with spectral gap λ and2-LogSob C .

Page 22: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

What are these lectures about?

High Dimensional Phenomena

High dimensional mixing: mixing of product processes on productspaces Ωn with n large.

Tight bounds

For which processes, given measures a and b can we findprecise upper/lower bounds for

sup(P[X0 ∈ A,Xt ∈ B] : P[A] = a,P[B] = b

)Interested in product space/processes of dimension n andanswers as n→∞.

Most important examples / techniques from probability /analysis.

Page 23: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Mulit-step prcoesses

How to bound P[X0 ∈ A0,X1 ∈ A1, . . . ,Xk ∈ Ak ] forprocesses X0, . . . ,Xk?

Interested in product space/processes of dimension n andanswers as n→∞.

Most important examples / techniques from additivecombinatorics.

Page 24: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

And more

Theory that does both?

Applications?

Page 25: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Today: tight bounds

Borell’s result.

Open Problem: The Boolean cube.

The state of affairs - partition into 3 parts or more.

Page 26: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Two Examples: Gaussian, Boolean

Correlated pairs (M-O’Donnell-Regev-Steif-Sudakov-05):

Let x , y ∈ −1, 1n be e−t correlated:

x is chosen uniformly and y is Tt correlated version.

i.e. E[xiyi ] = e−t for all i independently

Let A,B ⊂ −1, 1n1/2 with P[A] ≥ ε and P[B] ≥ ε

Then: P[x ∈ A, y ∈ B] ≥ ε2

1−e−t

Easy to prove when A = B ...

Gaussian Version

Let x , y ∈ Rn two Gaussian vectors:

x ∼ N(0, 1), y ∼ N(0, 1),E [xiyj ] = e−tδi ,j

Let A,B ⊂ Rn with P[A] ≥ ε and P[B] ≥ ε

Then: P[x ∈ A, y ∈ B] ≥ ε2

1−e−t

Page 27: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Two Examples: Gaussian, Boolean

Correlated pairs (M-O’Donnell-Regev-Steif-Sudakov-05):

Let x , y ∈ −1, 1n be e−t correlated:

x is chosen uniformly and y is Tt correlated version.

i.e. E[xiyi ] = e−t for all i independently

Let A,B ⊂ −1, 1n1/2 with P[A] ≥ ε and P[B] ≥ ε

Then: P[x ∈ A, y ∈ B] ≥ ε2

1−e−t

Easy to prove when A = B ...

Gaussian Version

Let x , y ∈ Rn two Gaussian vectors:

x ∼ N(0, 1), y ∼ N(0, 1),E [xiyj ] = e−tδi ,j

Let A,B ⊂ Rn with P[A] ≥ ε and P[B] ≥ ε

Then: P[x ∈ A, y ∈ B] ≥ ε2

1−e−t

Page 28: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Borell’s Result and Open Problems

Borell (85): In Gaussian case the maximum and minimum ofP[x ∈ A, y ∈ B] as a function of P[A] and P[B] is obtainedfor parallel half-spaces.

Do not know what is the optimum in −1, 1n. In particular:

Open Problem:

limn→∞

min(P[X ∈ A,Y ∈ B] : A,B ⊂ −1, 1n,P[A] = P[B] = 1/4)

and similarly for max.

Partition to 3 or more parts even in Gaussian space.

Page 29: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Open Problem:

limn→∞

Page 30: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Open Problem:

limn→∞

Page 31: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Open Problem:

limn→∞

Page 32: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

If there is time before the break ...

A cute proof of a special case of Borell’s result.

Connections to social choice Theory.

Page 33: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Simple Example 1

Cosmic coin problem(M-O’Donnell-05):

x ∈ −1, 1n uniform.

(y i )m1 conditionally independent given x .

Each pair (x , y i ) is ρ-correlated.

Problem: What is the largest P[y1 ∈ A, . . . ym ∈ A] can be?

Page 34: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Simple Example 2

(y i ,j)1≤i<j≤m is an exchangeable collection of vectors in−1, 1n.

If |I ∩ J| = 1 then yI , yJ are −1/3 correlated.

Otherwise independent.

Why?

If n voters rank alternatives uniformly at random, the pairwisepreferences between alternatives will be given by the collectiony .

Page 35: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Full support finite Ω using hyper-contraction

Thm: More General Reverse Hypercontractivity Theorem(M-Oleszkiewicz-Sen-13)

Let a the measure Ψ over a finite Ωk satisfyminx1,...,xk∈Ω Pr[X1 = x , . . . ,Xk = xk ] = α > 0 and have equalmarginals.

Consider the distribution Ψn and let A1, . . . ,Ak ⊆ Ωn, µ(Ai ) ≥ µ.Then:

Pr[X1 ∈ A1, . . .Xk ∈ Ak ] ≥ µO( 1α) ,

where (X1(i), . . . ,Xk(i)) are i.i.d. according to Ψ.

Note

This is a key tool of analyzing the examples above as well as manyothers.

Page 36: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Full support finite Ω using hyper-contraction

Let a the measure Ψ over a finite Ωk satisfyminx1,...,xk∈Ω Pr[X1 = x , . . . ,Xk = xk ] = α > 0 and have equalmarginals.Consider the distribution Ψn and let A1, . . . ,Ak ⊆ Ωn, µ(Ai ) ≥ µ.Then:

Pr[X1 ∈ A1, . . .Xk ∈ Ak ] ≥ µO( 1α) ,

Note

This is a key tool of analyzing the examples above as well as manyothers.

Page 37: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Notation

X X 1 X 2 . . . X i. . . X n

X (1) X(1)1 X

(1)2

· · · X(1)i

· · · X(1)n

X (2) X(2)1 X

(2)2

· · · X(2)i

· · · X(2)n

......

...

X (j) X(j)1 X

(j)2

· · · X(j)i

· · · X(j)n

......

...

X (`) X(`)1 X

(`)2

· · · X(`)i

· · · X(`)n

Tuples X i arei.i.d. according to P. Themarginals of P are πj .

Vectors X (j) aredistributedaccording toπj := πnj .

Distributedaccording toP := Pn.

α(P) := minx∈ΩP(x , x , . . . , x)

ρ(P) : See Definition ??

X(j)i ∈ Ω

X (j) ∈ Ω := Ωn

X i ∈ Ω := Ω`

X ∈ Ω := Ωn·`

S ⊆ Ω

Figure: Naming of the random variables in the general case. The columnsX i are distributed i .i .d according to P. Each X

(j)i is distributed

according to πj . The overall distribution of X is P.

Page 38: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Lower Bounds

We are mostly interested in two types of lower bounds:

Set hitting: Lower bounds on

P[X 1 ∈ A1, . . . ,Xk ∈ Ak ]

in terms of P[A1], . . . ,P[Ak ]

Same set hitting: Lower bounds on

P[X 1 ∈ A, . . . ,X k ∈ A]

in terms of P[A].

Set hitting will require something ... - e.g.X 1 = X 2 = . . . = X k .

Page 39: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Gaussian Bounds

Borell (85) k = 2 - parallel half-spaces are optimal (alsoIsaksson-Mossel, Neeman)

By a Reverse Brascamp-Lieb inq. (Ledoux,Chen-Dafnis-Paouris 14-15) for A, . . .C ⊂ Rn:

P[U ∈ A, . . . ,Z ∈ C ] ≥ (P[A] · · ·P[C ])1/(1−ρ2),

where ρ is the second eigenvalue of Σ.

Doesn’t require independence of coordinates

Page 40: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Full Support Case

Let a the measure Ψ over a finite Ωk satisfyminx1,...,xk∈Ω Pr[X1 = x , . . . ,Xk = xk ] = α > 0 and have equalmarginals.

Then:

Pr[X1 ∈ A1, . . .Xk ∈ Ak ] ≥ µO( 1α) ,

Page 41: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Full Support Case

Let a the measure Ψ over a finite Ωk satisfyminx1,...,xk∈Ω Pr[X1 = x , . . . ,Xk = xk ] = α > 0 and have equalmarginals.Then:

Pr[X1 ∈ A1, . . .Xk ∈ Ak ] ≥ µO( 1α) ,

Page 42: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Non full support?

What if the support of Ω is not full?

Do we care?

Maybe: This is what additive combinatorics is all about.

In particular: finite cominatorics in finite field models(Green-04 ... ).

Many other applications in combinatorics and computerscience.

Page 43: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Non full support?

What if the support of Ω is not full?

Do we care?

Maybe: This is what additive combinatorics is all about.

In particular: finite cominatorics in finite field models(Green-04 ... ).

Many other applications in combinatorics and computerscience.

Page 44: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Additive combinatorics perspective

Example:

Theorem (Finite Field Roth Theorem)

Y ,R be chosen uniformly at random at F n3 .

Then for every µ > 0 there exists c(µ) > 0,N(µ) such that ifn ≥ N(µ) and

A ⊂ F n3 satisfies P[A] ≥ µ, then:

P[Y ∈ A,Y + R ∈ A,Y + 2R ∈ A] ≥ c(µ).

Why is this true?

Page 45: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Additive combinatorics perspective

Example:

Theorem (Finite Field Roth Theorem)

Y ,R be chosen uniformly at random at F n3 .

Then for every µ > 0 there exists c(µ) > 0,N(µ) such that ifn ≥ N(µ) and

A ⊂ F n3 satisfies P[A] ≥ µ, then:

P[Y ∈ A,Y + R ∈ A,Y + 2R ∈ A] ≥ c(µ).

Why is this true?

Page 46: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Fourier Obstructions

Theorem (Finite Field Roth Theorem - Analysis)

Let Y ,R be chosen uniformly at random at F n3 . Let A,B,C ⊂ F n

3then

|P[Y ∈ A,Y + R ∈ B,Y + 2R ∈ C ]− P[A]P[B]P[C ]| ≤ ‖A‖∞

Only obstruction to uniformity is linear structure If A = B = C ,high Fourier coefficient =⇒ can restrict to linear subspace withhigher denisty Density increase arguments ...

Page 47: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

3then

Only obstruction to uniformity is linear structure

If A = B = C ,high Fourier coefficient =⇒ can restrict to linear subspace withhigher denisty Density increase arguments ...

Page 48: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

3then

Only obstruction to uniformity is linear structure If A = B = C ,high Fourier coefficient =⇒ can restrict to linear subspace withhigher denisty

Density increase arguments ...

Page 49: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

3then

Only obstruction to uniformity is linear structure If A = B = C ,high Fourier coefficient =⇒ can restrict to linear subspace withhigher denisty Density increase arguments ...

Page 50: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Higher Order Arithmetic Obstructions

Furstenberg-Weiss (80s): For longer arithmetic progressions,obstructions other than Fourier.

Gowers: Obstructions can be identified using the Gowersnorms.

Again - use obstruction to your benefit.

Thm: (Gowers 08; Rodel and Skokan 04,06):

If q is prime and ` ≤ q thenfor every µ > 0 there exists c(µ) > 0,N(µ) such that ifn ≥ N(µ) andA ⊂ F n

q satisfies P[A] ≥ µ, then:

P[Y ∈ A,Y + R ∈ A, . . . ,Y + (`− 1)R ∈ A] ≥ c(µ),

where A,R ∈ F nq are chosen uniformly at random.

Question: Is the additive structure necessary?

Page 51: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Higher Order Arithmetic Obstructions

Furstenberg-Weiss (80s): For longer arithmetic progressions,obstructions other than Fourier.

Gowers: Obstructions can be identified using the Gowersnorms.

Again - use obstruction to your benefit.

Thm: (Gowers 08; Rodel and Skokan 04,06):

If q is prime and ` ≤ q thenfor every µ > 0 there exists c(µ) > 0,N(µ) such that ifn ≥ N(µ) andA ⊂ F n

q satisfies P[A] ≥ µ, then:

P[Y ∈ A,Y + R ∈ A, . . . ,Y + (`− 1)R ∈ A] ≥ c(µ),

where A,R ∈ F nq are chosen uniformly at random.

Question: Is the additive structure necessary?

Page 52: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Obstruction to Chaos

Consider the support of Ω as a graph G with vertex V = allatoms with non-zero weight and edges between any twoatoms that differ in one coordinate.

We say that ρ < 1 if the graph G is connected.

More formally:

Definition

ρ(P,S ,T ) := supCov [f (X (S)), g(X (T ))]

∣∣∣ f : Ω(S) → R, g : Ω(T ) → R,

Var [f (X (S))] = Var [g(X (T ))] = 1.

The correlation of P is ρ(P) := maxj∈[`] ρ (P, j, [`] \ j).

Page 53: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

The quest for a unifying theory

Is there one theory that explains both the noisy examples and theadditive theory?

Page 54: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

Example

Let X be uniform in F n3 .

Let Yi = Xi or Xi + 1 with probability 1/2 independently foreach coordinate.

Theorem =⇒ P[X ∈ A,Y ∈ a] ≥ c(P[A]).

Motivation from understanding “parallel repetition”.

Does not follow from hyper-contraction nor does it followfrom additive techniques ...

Page 55: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

A General Result

Theorem (+ Hazla, Holenstein)

Suppose (X ,Y ) is distributed in a finite Ω2 such that:

α = minaP[X = Y = a] > 0.

P[X = a] = P[Y = a] for all a.

Then for any set A ⊂ Ωn with PX⊗n [A] = PY⊗n [A] ≥ µ it holdsthat

P[X ∈ A,Y ∈ A] ≥ c(α, µ) > 0

Our c is pretty bad:

c = 1/ exp(exp(exp(1/(µ)D))), D = D(α)

Related to the fact that the proof is interesting:1 Lose in “Regularity Lemma” type arguments.2 Lose in “Invariance” transforming the problem to a Gaussian

problem.

Page 56: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

A General Result

Theorem (+ Hazla, Holenstein)

Suppose (X ,Y ) is distributed in a finite Ω2 such that:

α = minaP[X = Y = a] > 0.

P[X = a] = P[Y = a] for all a.

Then for any set A ⊂ Ωn with PX⊗n [A] = PY⊗n [A] ≥ µ it holdsthat

P[X ∈ A,Y ∈ A] ≥ c(α, µ) > 0

Our c is pretty bad:

c = 1/ exp(exp(exp(1/(µ)D))), D = D(α)

Related to the fact that the proof is interesting:1 Lose in “Regularity Lemma” type arguments.2 Lose in “Invariance” transforming the problem to a Gaussian

problem.

Page 57: Mixing in Product Spaces - MIT Mathematicsmath.mit.edu/~elmos/slides.pdf · 2017. 8. 22. · mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloﬀ-Coste

A Markov Chain Theorem and a general process theorem

Theorem[+Hazla, Holenstein]

Xi ,Yi ,Zi , . . . ,Wi be a Markov chain over Ω withminx∈Ω Pr[Xi = Yi = Zi = . . .Wi = x ] = β > 0 and uniformmarginals.Let A ⊆ Ωn, µ(A) = µ > 0.

Pr[X ∈ A ∧ Y ∈ A ∧ Z ∈ A, . . . ,∧W ∈ A] ≥ f (µ, β) > 0 .

Theorem[+Hazla, Holenstein]

Xi ,Yi ,Zi , . . . ,Wi be distributed over Ωk withminx∈Ω Pr[Xi = Yi = Zi = . . .Wi = x ] = β > 0 and uniformmarginals. Suppose further that ρ(Xi ,Yi , . . . ,Wi ) < 1. LetA ⊆ Ωn, µ(A) = µ > 0.

Pr[X ∈ A ∧ Y ∈ A ∧ Z ∈ A, . . . ,∧W ∈ A] ≥ f (µ, β) > 0 .