a matrix expander chernoff boundnikhil/mchernoff.pdf · ankit garg, yin tat lee, zhao song, nikhil...
TRANSCRIPT
-
A Matrix Expander Chernoff Bound
Ankit Garg, Yin Tat Lee, Zhao Song, Nikhil Srivastava
UC Berkeley
-
Vanilla Chernoff Bound
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
-
Vanilla Chernoff Bound
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
Two Extensions:1.Dependent Random Variables2.Sums of random matrices
-
Expander Chernoff Bound [AKSâ87, Gâ94]
Thm[Gillmanâ94]: Suppose ðº = (ð, ðž) is a regular graph with transition matrix ð which has second eigenvalue ð. Let ð: ð â âbe a function with ð 𣠆1 and Ïð£ ð ð£ = 0.
-
Expander Chernoff Bound [AKSâ87, Gâ94]
Thm[Gillmanâ94]: Suppose ðº = (ð, ðž) is a regular graph with transition matrix ð which has second eigenvalue ð. Let ð: ð â âbe a function with ð 𣠆1 and Ïð£ ð ð£ = 0.Then, if ð£1, ⊠, ð£ð is a stationary random walk:
â1
ð
ð
ð(ð£ð) ⥠ð †2exp(âð(1 â ð)ðð2)
Implies walk of length ð â 1 â ð â1 concentrates around mean.
-
Intuition for Dependence on Spectral Gap
ð = +1 ð = â1
1 â ð =1
ð2
-
Intuition for Dependence on Spectral Gap
1 â ð =1
ð2
Typical random walk takes Ω ð2 steps to see both ±1.
-
Derandomization Motivation
Thm[Gillmanâ94]: Suppose ðº = (ð, ðž) is a regular graph with transition matrix ð which has second eigenvalue ð. Let ð: ð â âbe a function with ð 𣠆1 and Ïð£ ð ð£ = 0.Then, if ð£1, ⊠, ð£ð is a stationary random walk:
â1
ð
ð
ð(ð£ð) ⥠ð †2exp(âð(1 â ð)ðð2)
To generate ð indep samples from [ð] need ð log ð bits.If ðº is 3 âregular with constant ð, walk needs: log ð + ð log(3) bits.
-
Derandomization Motivation
Thm[Gillmanâ94]: Suppose ðº = (ð, ðž) is a regular graph with transition matrix ð which has second eigenvalue ð. Let ð: ð â âbe a function with ð 𣠆1 and Ïð£ ð ð£ = 0.Then, if ð£1, ⊠, ð£ð is a stationary random walk:
â1
ð
ð
ð(ð£ð) ⥠ð †2exp(âð(1 â ð)ðð2)
To generate ð indep samples from [ð] need ð log ð bits.If ðº is 3 âregular with constant ð, walk needs: log ð + ð log(3) bits.
When ð = ð(log ð) reduces randomness quadratically. Can completely derandomize in polynomial time.
-
Matrix Chernoff Bound
Thm [Rudelsonâ97, Ahlswede-Winterâ02, Oliveiraâ08, Troppâ11âŠ]. If ð1, ⊠, ðð are independent mean zero random ð à ð Hermitianmatrices with | ðð | †1 then
â1
ð
ð
ðð ⥠ð †2ðexp(âðð2/4)
-
Matrix Chernoff Bound
Thm [Rudelsonâ97, Ahlswede-Winterâ02, Oliveiraâ08, Troppâ11âŠ]. If ð1, ⊠, ðð are independent mean zero random ð à ð Hermitianmatrices with | ðð | †1 then
â1
ð
ð
ðð ⥠ð †2ðexp(âðð2/4)
Very generic bound (no independence assumptions on the entries).Many applications + martingale extensions (see Tropp).
Factor ð is tight because of the diagonal case.
-
Matrix Expander Chernoff Bound?
Conj[Wigderson-Xiaoâ05]: Suppose ðº = (ð, ðž) is a regular graph with transition matrix ð which has second eigenvalue ð. Let ð: ð â âðÃð be a function with | ð ð£ | †1 and Ïð£ ð ð£ = 0.Then, if ð£1, ⊠, ð£ð is a stationary random walk:
â1
ð
ð
ð(ð£ð) ⥠ð †2ðexp(âð(1 â ð)ðð2)
Motivated by derandomized Alon-Roichman theorem.
-
Main Theorem
Thm. Suppose ðº = (ð, ðž) is a regular graph with transition matrix ð which has second eigenvalue ð. Let ð: ð â âðÃð be a function with | ð ð£ | †1 and Ïð£ ð ð£ = 0.Then, if ð£1, ⊠, ð£ð is a stationary random walk:
â1
ð
ð
ð(ð£ð) ⥠ð †2ðexp(âð(1 â ð)ðð2)
Gives black-box derandomization of any application of matrix Chernoff
-
1. Proof of Chernoff: reduction to mgf
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
â
ð
ðð ⥠ðð †ðâð¡ðððŒ exp ð¡
ð
ðð = ðâð¡ððà·
ð
ðŒðð¡ðð
†ðâð¡ðð 1 + ð¡ðŒðð + ð¡2 ð †ðâð¡ðð+ðð¡
2†eâ
ðð2
4
-
1. Proof of Chernoff: reduction to mgf
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
â
ð
ðð ⥠ðð †ðâð¡ðððŒ exp ð¡
ð
ðð = ðâð¡ððà·
ð
ðŒðð¡ðð
†ðâð¡ðð 1 + ð¡ðŒðð + ð¡2 ð †ðâð¡ðð+ðð¡
2†eâ
ðð2
4
Markov
-
1. Proof of Chernoff: reduction to mgf
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
â
ð
ðð ⥠ðð †ðâð¡ðððŒ exp ð¡
ð
ðð = ðâð¡ððà·
ð
ðŒðð¡ðð
†ðâð¡ðð 1 + ð¡ðŒðð + ð¡2 ð †ðâð¡ðð+ðð¡
2†eâ
ðð2
4
Indep.
-
1. Proof of Chernoff: reduction to mgf
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
â
ð
ðð ⥠ðð †ðâð¡ðððŒ exp ð¡
ð
ðð = ðâð¡ððà·
ð
ðŒðð¡ðð
†ðâð¡ðð 1 + ð¡ðŒðð + ð¡2 ð †ðâð¡ðð+ðð¡
2†eâ
ðð2
4
Bounded
-
1. Proof of Chernoff: reduction to mgf
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
â
ð
ðð ⥠ðð †ðâð¡ðððŒ exp ð¡
ð
ðð = ðâð¡ððà·
ð
ðŒðð¡ðð
†ðâð¡ðð 1 + ð¡ðŒðð + ð¡2 ð †ðâð¡ðð+ðð¡
2†eâ
ðð2
4
Mean zero
-
1. Proof of Chernoff: reduction to mgf
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
â
ð
ðð ⥠ðð †ðâð¡ðððŒ exp ð¡
ð
ðð = ðâð¡ððà·
ð
ðŒðð¡ðð
†ðâð¡ðð 1 + ð¡ðŒðð + ð¡2 ð †ðâð¡ðð+ðð¡
2†eâ
ðð2
4
-
1. Proof of Chernoff: reduction to mgf
Thm [Hoeffding, Chernoff]. If ð1, ⊠, ðð are independent mean zerorandom variables with ðð †1 then
â1
ð
ð
ðð ⥠ð †2exp(âðð2/4)
â
ð
ðð ⥠ðð †ðâð¡ðððŒ exp ð¡
ð
ðð = ðâð¡ððà·
ð
ðŒðð¡ðð
†ðâð¡ðð 1 + ð¡ðŒðð + ð¡2 ð †ðâð¡ðð+ðð¡
2†eâ
ðð2
4
-
2. Proof of Expander Chernoff
Goal: Show ðŒexp ð¡ Ïð ð ð£ð †exp ðððð¡2
-
2. Proof of Expander Chernoff
Goal: Show ðŒexp ð¡ Ïð ð ð£ð †exp ðððð¡2
Issue: ðŒexp Ïð ð ð£ð â Ïð ðŒexp(ð¡ð(ð£ð))
How to control the mgf without independence?
-
Step 1: Write mgf as quadratic form
=
ð0,âŠ,ððâð
â(ð£0 = ð0)ð ð0, ð1 âŠð(ððâ1, ðð) exp ð¡
1â€ðâ€ð
ð ðð
=1
ð
ð0,âŠ,ððâð
ð ð0, ð1 ðð¡ð(ð1)âŠð ððâ1, ðð ð
ð¡ð(ðð)
ðŒðð¡ Ïðâ€ð ð(ð£ð)
= âšð¢, ðžð ðð¢â© where ð¢ =1
ð, ⊠,
1
ð. ðð
ðð
ðð
ðð
ð ð0, ð1
ð ð1, ð2
ð ð2, ð3
ðð¡ð(ð1)
ðð¡ð(ð2)
ðð¡ð(ð3)
ðž =ðð¡ð(1)
ðð¡ð(ð)
-
Step 1: Write mgf as quadratic form
=
ð0,âŠ,ððâð
â(ð£0 = ð0)ð ð0, ð1 âŠð(ððâ1, ðð) exp ð¡
1â€ðâ€ð
ð ðð
=1
ð
ð0,âŠ,ððâð
ð ð0, ð1 ðð¡ð(ð1)âŠð ððâ1, ðð ð
ð¡ð(ðð)
ðŒðð¡ Ïðâ€ð ð(ð£ð)
= âšð¢, ðžð ðð¢â© where ð¢ =1
ð, ⊠,
1
ð. ðð
ðð
ðð
ðð
ð ð0, ð1
ð ð1, ð2
ð ð2, ð3
ðð¡ð(ð1)
ðð¡ð(ð2)
ðð¡ð(ð3)
ðž =ðð¡ð(1)
ðð¡ð(ð)
-
Step 1: Write mgf as quadratic form
=
ð0,âŠ,ððâð
â(ð£0 = ð0)ð ð0, ð1 âŠð(ððâ1, ðð) exp ð¡
1â€ðâ€ð
ð ðð
=1
ð
ð0,âŠ,ððâð
ð ð0, ð1 ðð¡ð(ð1)âŠð ððâ1, ðð ð
ð¡ð(ðð)
ðŒðð¡ Ïðâ€ð ð(ð£ð)
= âšð¢, ðžð ðð¢â© where ð¢ =1
ð, ⊠,
1
ð. ðð
ðð
ðð
ðð
ð ð0, ð1
ð ð1, ð2
ð ð2, ð3
ðð¡ð(ð1)
ðð¡ð(ð2)
ðð¡ð(ð3)
ðž =ðð¡ð(1)
âŠðð¡ð(ð)
-
Step 2: Bound quadratic form
Observe: ||ð â ðœ|| †ð where ðœ = complete graph with self loops
So for small ð, should have ð¢, ðžð ðð¢ â âšð¢, (ðžðœ)ðð¢â©
Goal: Show âšð¢, ðžð ðð¢â© †exp ðððð¡2
-
Step 2: Bound quadratic form
Observe: ||ð â ðœ|| †ð where ðœ = complete graph with self loops
So for small ð, should have ð¢, ðžð ðð¢ â âšð¢, (ðžðœ)ðð¢â©
Approach 1. Use perturbation theory to show ||ðžð|| †exp ððð¡2
Goal: Show âšð¢, ðžð ðð¢â© †exp ðððð¡2
-
Step 2: Bound quadratic form
Observe: ||ð â ðœ|| †ð where ðœ = complete graph with self loops
So for small ð, should have ð¢, ðžð ðð¢ â âšð¢, (ðžðœ)ðð¢â©
Approach 1. Use perturbation theory to show ||ðžð|| †exp ððð¡2
Approach 2. (Healyâ08) track projection of iterates along ð¢.
Goal: Show âšð¢, ðžð ðð¢â© †exp ðððð¡2
-
Simplest case: ð = 0
ðžððžððžððžðð¢
-
Simplest case: ð = 0
ðžððžððžððžðð¢
-
Simplest case: ð = 0
ðžððžððžððžðð¢
-
Simplest case: ð = 0
ðžððžððžððžðð¢
-
Simplest case: ð = 0
ðžððžððžððžðð¢
-
Observations
ðžððžððžððžðð¢
Observe: ð shrinks every vector orthogonal to ð¢ by ð.
ð¢, ðžð¢ =1
nÏð£âV ð
ð¡ð(ð£) = 1 + ð¡2 by mean zero
condition.
-
A small dynamical system
ð£ = ðžð ðð¢
ð|| = âšð£, ð¢â©
ð⥠= ||ðâ¥ð£||
ð|| ðâ¥
-
A small dynamical system
ð£ = ðžð ðð¢
ð|| = âšð£, ð¢â©
ð⥠= ||ðâ¥ð£||
ð|| ðâ¥
0
0
ðð¡2
ð¡
ð£ â· ðžðð£ð = 0
-
A small dynamical system
ð£ = ðžð ðð¢
ð|| = âšð£, ð¢â©
ð⥠= ||ðâ¥ð£||
ð|| ðâ¥
ðð¡ð
ðð¡
ðð¡2
ð¡
ð£ â· ðžðð£any ð
-
A small dynamical system
ð£ = ðžð ðð¢
ð|| = âšð£, ð¢â©
ð⥠= ||ðâ¥ð£||
ð|| ðâ¥
†1
ðð¡
ðð¡2
ð¡
ð£ â· ðžðð£ð¡ †log 1/ð
-
A small dynamical system
ð£ = ðžð ðð¢
ð|| = âšð£, ð¢â©
ð⥠= ||ðâ¥ð£||
ð|| ðâ¥
†1
ðð¡
ðð¡2
ð¡
Any mass that leaves gets shrunk by ð
-
Analyzing dynamical system gives
Thm[Gillmanâ94]: Suppose ðº = (ð, ðž) is a regular graph with transition matrix ð which has second eigenvalue ð. Let ð: ð â âbe a function with ð 𣠆1 and Ïð£ ð ð£ = 0.Then, if ð£1, ⊠, ð£ð is a stationary random walk:
â1
ð
ð
ð(ð£ð) ⥠ð †2exp(âð(1 â ð)ðð2)
-
Main Issue: exp ðŽ + ðµ â exp ðŽ exp(ðµ) unless ðŽ, ðµ = 0
Generalization to Matrices?
Setup: ð: ð â âðÃð , random walk ð£1, ⊠, ð£ð.Goal:
ðŒðð exp ð¡
ð
ð ð£ð †ð â exp ððð¡2
where ððŽ is defined as a power series.
canât express exp(sum) as iterated product.
-
Main Issue: exp ðŽ + ðµ â exp ðŽ exp(ðµ) unless ðŽ, ðµ = 0
Generalization to Matrices?
Setup: ð: ð â âðÃð , random walk ð£1, ⊠, ð£ð.Goal:
ðŒðð exp ð¡
ð
ð ð£ð †ð â exp ððð¡2
where ððŽ is defined as a power series.
canât express exp(sum) as iterated product.
-
The Golden-Thompson InequalityPartial Workaround [Golden-Thompsonâ65]:
ðð exp ðŽ + ðµ †ðð exp ðŽ exp ðµ
Sufficient for independent case by induction.For expander case, need this for ð matrices. False!
ðð ððŽ+ðµ+ð¶ > 0 > ðð(eAeBeC)
â ððŽ
â ððµ â ðð¶
-
The Golden-Thompson InequalityPartial Workaround [Golden-Thompsonâ65]:
ðð exp ðŽ + ðµ †ðð exp ðŽ exp ðµ
Sufficient for independent case by induction.For expander case, need this for ð matrices.
ðð ððŽ+ðµ+ð¶ > 0 > ðð(eAeBeC)
â ððŽ
â ððµ â ðð¶
-
The Golden-Thompson InequalityPartial Workaround [Golden-Thompsonâ65]:
ðð exp ðŽ + ðµ †ðð exp ðŽ exp ðµ
Sufficient for independent case by induction.For expander case, need this for ð matrices. False!
ðð ððŽ+ðµ+ð¶ > 0 > ðð(eAeBeC)
â ððŽ
â ððµ â ðð¶
-
Key Ingredient
[Sutter-Berta-Tomamichelâ16] If ðŽ1, ⊠, ðŽð are Hermitian, then
log ðð ððŽ1+ âŠ+ðŽð
†⫠ððœ(ð) log ðð ððŽ1(1+ðð)
2 âŠððŽð(1+ðð)
2 ððŽ1(1+ðð)
2 âŠððŽð(1+ðð)
2
â
where ðœ(ð) is an explicit probability density on â.
1. Matrix on RHS is always PSD.
2. Average-case inequality: ððŽð/2 are conjugated by unitaries.3. Implies Liebâs concavity, triple-matrix, ALT, and more.
-
Key Ingredient
[Sutter-Berta-Tomamichelâ16] If ðŽ1, ⊠, ðŽð are Hermitian, then
log ðð ððŽ1+ âŠ+ðŽð
†⫠ððœ(ð) log ðð ððŽ1(1+ðð)
2 âŠððŽð(1+ðð)
2 ððŽ1(1+ðð)
2 âŠððŽð(1+ðð)
2
â
where ðœ(ð) is an explicit probability density on â.
1. Matrix on RHS is always PSD.
2. Average-case inequality: ððŽð/2 are conjugated by unitaries.3. Implies Liebâs concavity, triple-matrix, ALT, and more.
-
Proof of SBT: Lie-Trotter Formula
ððŽ+ðµ+ð¶ = limðâ0+
ðððŽðððµððð¶1/ð
log ðð ððŽ+ðµ+ð¶ = limðâ0+
log ||ðº ð ||2/ð
For ðº ð§ â ðð§ðŽ
2 ðð§ðµ
2 ðð§ð¶
2
-
Proof of SBT: Lie-Trotter Formula
ððŽ+ðµ+ð¶ = limðâ0+
ðððŽðððµððð¶1/ð
log ðð ððŽ+ðµ+ð¶ = limðâ0+
2log ||ðº ð ||2/ð/ð
For ðº ð§ â ðð§ðŽ
2 ðð§ðµ
2 ðð§ð¶
2
-
Complex Interpolation (Stein-Hirschman)
log ðº ð 2/ð
ð
-
Complex Interpolation (Stein-Hirschman)
|ð¹ ð | = ðº ð 2/ð
For each ð, find analytic ð¹(ð§) st:
|ð¹ 1 + ðð¡ | †ðº 1 + ðð¡ 2
|ð¹ ðð¡ | = 1
-
Complex Interpolation (Stein-Hirschman)
|ð¹ ð | = ðº ð 2/ð
For each ð, find analytic ð¹(ð§) st:
|ð¹ 1 + ðð¡ | †ðº 1 + ðð¡ 2
|ð¹ ðð¡ | = 1
log ð¹ ð †නlog |ð¹ ðð¡ | + නlog |ð¹ 1 + ðð¡ |
-
Complex Interpolation (Stein-Hirschman)
|ð¹ ð | = ðº ð 2/ð
For each ð, find analytic ð¹(ð§) st:
|ð¹ 1 + ðð¡ | †ðº 1 + ðð¡ 2
|ð¹ ðð¡ | = 1
log ð¹ ð †නlog |ð¹ ðð¡ | + නlog |ð¹ 1 + ðð¡ |
lim ð â 0
-
Key Ingredient
[Sutter-Berta-Tomamichelâ16] If ðŽ1, ⊠, ðŽð are Hermitian, then
log ðð ððŽ1+ âŠ+ðŽð
†⫠ððœ(ð) log ðð ððŽ1(1+ðð)
2 âŠððŽð(1+ðð)
2 ððŽ1(1+ðð)
2 âŠððŽð(1+ðð)
2
â
where ðœ(ð) is an explicit probability density on â.
-
Key Ingredient
[Sutter-Berta-Tomamichelâ16] If ðŽ1, ⊠, ðŽð are Hermitian, then
log ðð ððŽ1+ âŠ+ðŽð
†⫠ððœ(ð) log ðð ððŽ1(1+ðð)
2 âŠððŽð(1+ðð)
2 ððŽ1(1+ðð)
2 âŠððŽð(1+ðð)
2
â
where ðœ(ð) is an explicit probability density on â.
Issue. SBT involves integration over unbounded region, bad for Taylor expansion.
-
Bounded Modification of SBT
Solution. Prove bounded version of SBT by replacing stripwith half-disk.
[Thm] If ðŽ1, ⊠, ðŽð are Hermitian, then
log ðð ððŽ1+âŠ+ðŽð
†⫠ððœ(ð) log ðð ððŽ1ð
ðð
2 âŠððŽðð
ðð
2 ððŽ1ð
ðð
2 âŠððŽðð
ðð
2
â
where ðœ(ð) is an explicit probability density on
âð
2,ð
2.
ð
Proof. Analytic ð¹(ð§) + Poisson Kernel + Riemann map.
-
Handling Two-sided Products
Issue. Two-sided rather than one-sided products:
ðð ðð¡ð ð£1 ð
ðð
2 âŠðð¡ð ð£ð ð
ðð
2 ðð¡ð ð£1 ð
ðð
2 âŠðð¡ð ð£ð ð
ðð
2
â
Solution.Encode as one-sided product by using ðð ðŽððµ = ðŽâðµð ð£ðð ð :
âšðð¡ð ð£1 ð
ðð
2 â ðð¡ð ð£1
âððð
2 âŠðð¡ð ð£ð
âððâðð
2 âðð¡ð ð£ð
âððâðð
2 vec Id , vec Id â©
-
Handling Two-sided Products
Issue. Two-sided rather than one-sided products:
ðð ðð¡ð ð£1 ð
ðð
2 âŠðð¡ð ð£ð ð
ðð
2 ðð¡ð ð£1 ð
ðð
2 âŠðð¡ð ð£ð ð
ðð
2
â
Solution.Encode as one-sided product by using ðð ðŽððµ = ðŽâðµð ð£ðð ð :
âšðð¡ð ð£1 ð
ðð
2 âðð¡ð ð£1
âðððð
2 âŠðð¡ð ð£ð
âððâðð
2 âðð¡ð ð£ð
âððâðð
2 vec Id , vec Id â©
-
Finishing the Proof
Carry out a version of Healyâs argument with ð â ðŒð2 and:
And ð£ðð(ðŒð) â ð¢ instead of ð¢.
This leads to the additional ð factor.
ðð
ðð
ðð
ðð
ð ð0, ð1
ð ð1, ð2
ð ð2, ð3
ðž =ðð¡ð 1 ððð
2 âðð¡ð 1 âðððð
2
âŠ
ðð¡ð(ð)ððð
2 âðð¡ð ð âððâðð
2
-
Main Theorem
Thm. Suppose ðº = (ð, ðž) is a regular graph with transition matrix ð which has second eigenvalue ð. Let ð: ð â âðÃð be a function with | ð ð£ | †1 and Ïð£ ð ð£ = 0.Then, if ð£1, ⊠, ð£ð is a stationary random walk:
â1
ð
ð
ð(ð£ð) ⥠ð †2ðexp(âð(1 â ð)ðð2)
-
Open Questions
Other matrix concentration inequalities (multiplicative, low-rank, moments)
Other Banach spaces(Schatten norms)
More applications of complex interpolation