on the power of semidefinite programming …on the power of semidefinite programming hierarchies...
TRANSCRIPT
On the Power of Semidefinite Programming Hierarchies
Prasad Raghavendra Georgia Institute of Technology,
Atlanta, GA.
David Steurer Microsoft Research New England
Cambridge, MA
Overview
• Background and Motivation
• Introduction to SDP Hierarchies (Lasserre SDP hierarchy)
• Rounding SDP hierarchies via Global Correlation.
BREAK • Graph Spectrum and Small-Set Expansion.
• Sum of Squares Proofs.
Graph Spectrum &
Small-Set Expansion
Question: If 𝐺 is a 𝛿-small set expander, how many eigenvalues larger than 1 − 𝜖 can the adjancency matrix 𝐴 have?
𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛(𝑆) = # 𝑒𝑑𝑔𝑒𝑠 𝑙𝑒𝑎𝑣𝑖𝑛𝑔 𝑆
𝑑 |𝑆|
vertex set S A random neighbor of a random vertex in 𝑆
is outside of 𝑆 with probability 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛(𝑆)
A Question in Spectral Graph Theory
d-regular graph G
Def (Small Set Expander): A regular graph G is a δ-small set expander if for every set S ⊂ 𝑉,
S ≤ 𝛿𝑁 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛 𝑆 ≥1
2
Let 𝐴 be the normalized adjacency matrix of the d-regular graph 𝐺,
(all entries are 0 𝑜𝑟1
𝑑)
𝐴 has 𝑛 eigenvalues 𝜆1 = 1 ≥ 𝜆2 ≥ 𝜆3 ≥ ⋯ ≥ 𝜆𝑁 (say 𝛿 = 10−6)
A Question in Spectral Graph Theory
Def (Small Set Expander): A regular graph G is a δ-small set expander if for every set S ⊂ 𝑉
S ≤ 𝛿𝑁 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛 𝑆 ≥1
2
Question: If 𝐺 is a 𝛿-small set expander, How many eigenvalues larger than 1 − 𝜖 can the normalized adjancency matrix 𝐴 have?
Cheeger’s inequality: every large eigenvalue a sparse cut in 𝐺
(𝜆𝑖 ≥ 1 − 𝜖) ( cut of sparsity O 𝜖 )
d-regular graph G
Intuitively, How many sparse cuts can a graph 𝐺 have without having a unbalanced sparse cut?
Def (Small Set Expander): A regular graph G is a δ-small set expander if for every set S ⊂ 𝑉
S ≤ 𝛿𝑁 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛 𝑆 ≥1
2
Question: If 𝐺 is a 𝛿-small set expander, How many eigenvalues larger than 1 − 𝜖 can the normalized adjacency matrix 𝐴 have?
Significance of the Question
𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑟𝑎𝑛𝑘1−𝜖 𝐺 ≝ # of eigenvalues of graph G that are ≥ 1 − 𝜖
[Barak-Raghavendra-Steurer][Guruswami-Sinop] 𝑘-round Lasserre SDP solves 𝑈𝐺 𝜖 on graphs with low threshold rank ≤ 𝑘.
Graph 𝐺 has small non-expanding sets, decompose 𝐺 in to smaller pieces and solve each piece.
If UGC is true, there must be hard instances of Unique Games that a) Have high threshold rank b) Are Small-Set Expanders.
Def (Small Set Expander): A regular graph G is a δ-small set expander if for every set S ⊂ 𝑉
S ≤ 𝛿𝑁 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛 𝑆 ≥1
2
Question: If 𝐺 is a 𝛿-small set expander, How many eigenvalues larger than 1 − 𝜖 can the normalized adjacency matrix 𝐴 have?
𝑅 𝜖, 𝛿 ≝ Answer to the above question (a function of 𝑁, 𝜖, 𝛿)
Significance of the Question
[Arora-Barak-Steurer 2010]
There is a N𝑅 𝜖,𝛿 -time algorithm for GAP-SMALL-SET-EXPANSION problem – a problem closely related to UNIQUE GAMES.
Gap Small Set Expansion Problem (GapSSE)(𝜖, 𝛿) Given a graph G and constants 𝛿, 𝜖 > 0,
Is Ф(𝛿) < 𝜖 OR Ф(𝛿) > 1 − 𝜖? where Φ 𝛿 =minimum expansion of sets of size ≤ 𝛿
[Arora-Barak-Steurer 2010] 𝑅 𝜖, 𝛿 ≤ 𝑁𝜖 A subexponential-time algorithm for GAP-SMALL-SET-EXPANSION problem
At the time,
Best known lower bound for 𝑅 𝜖, 𝛿 = log 𝑁 (GAP-SMALL-SET-EXPANSION problem could be solved in quasipolynomial time.)
[Barak-Gopalan-Hastad-Meka-Raghavendra-Steurer]
For all small constant 𝛿, There exists a graph (the Short Code Graph) that is a 𝛿 −small set
expander with exp (log𝛽 𝑛) eigenvalues ≥ 1 − 𝜖. , 𝑖. 𝑒. ,
𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑟𝑎𝑛𝑘1−𝜖 𝐺 ≥ exp (log𝛽 𝑁) for some 𝛽 depending on 𝜖.
Short Code Graph
• Led to new gadgets for hardness reductions (derandomized Majority is Stablest) and new SDP integrality gaps.
• An interesting mix of techniques from, Discrete Fourier analysis, Locally Testable Codes and Derandomization.
[BGHMRS 11] exp (log𝛽 𝑁) ≤ 𝑅 𝜖, 𝛿 ≤ 𝑁𝜖 [ABS]
Long Code Graph
• Eigenvectors • Small Set Expansion
Short Code Graph
Overview
The Long Code Graph aka Noisy Hypercube
Noise Graph: 𝐻𝜖 Vertices: −1,1 𝑛 Edges: Connect every pair of points in hypercube separated by a Hamming distance of 𝜖𝑛
n dimensional hypercube : {-1,1}n
x
y
𝑥 and 𝑦 differ in 𝜖𝑛 coordinates.
Dictator cuts: Cuts parallel to the Axis (given by 𝐹(𝑥) = 𝑥𝑖)
The dictator cuts yield 𝑛 sparse cuts in graph 𝐻𝜖
1
1 1
1
-1 -1
-1
Eigenvectors are functions on −1,1 𝑛
Sparsity of Dictator Cuts
n dimensional hypercube
Connect every pair of vertices in hypercube separated by Hamming distance of 𝝐𝒏
𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑒𝑑𝑔𝑒𝑠 𝑐𝑢𝑡 𝑏𝑦 𝑓𝑖𝑟𝑠𝑡 𝑑𝑖𝑐𝑡𝑎𝑡𝑜𝑟 = Pr
𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑑𝑔𝑒 (𝑥,𝑦)[ 𝑥, 𝑦 𝑖𝑠 𝑐𝑢𝑡]
= Pr𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑑𝑔𝑒 (𝑥,𝑦)
[𝑥1 ≠ 𝑦1]
= 𝜖
Dictator cuts: 𝑛-eigenvectors with eigenvalues 1 − 𝜖 for graph 𝐻𝜖 (Number of vertices N = 2𝑛, so #of eigenvalues = log 𝑁)
1
1
1
1
-1 -1
-1
Eigenfunctions for Noisy Hypercube Graph
Eigenfunction Eigenvalue 𝐹1 𝑥 = 𝑥1, 𝐹2 𝑥 = 𝑥2, … , 𝐹𝑛 𝑥 = 𝑥𝑛 1 − 𝜖 𝐹12 𝑥 = 𝑥1𝑥2, 𝐹23 𝑥 = 𝑥2𝑥3, … 𝐹𝑛−1𝑛 𝑥 = 𝑥𝑛−1 𝑥𝑛 1 − 𝜖 2 ……………………… Degree d multilinear polynomials 1 − 𝜖 𝑑
Eigenfunctions for the Noisy hypercube graph are multilinear polynomials of fixed degree! (Noisy hypercube is a Cayley graph on 𝑍2
𝑛 , therefore its eigen functions are
characters of the group )
Suppose 𝑓 = 𝟙𝑆 indicator function of a small non-expanding set. S has 𝛿-fraction of vertices 𝑓 2
2= 𝐸 𝑓2 = 𝛿
Fraction of edges inside 𝑆 = Erandom edge x,y f x f y
= 𝑓, 𝐺𝑓 If 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛 𝑆 ≤ 0.001 then, at least a 0.999𝛿-fraction of edges are inside 𝑆. So,
𝑓, 𝐺𝑓 ≥ 0.999 𝑓 22
(f is close to the span of eigenvectors of G with eigenvalue ≥ 0.99)
Small-Set Expansion (SSE)
regular graph 𝐺 with vertex set 𝑉, parameter 𝛿 > 0 Given:
𝑆
Conclusion: Indicator function of a small non-expanding set f = 𝟙𝑆 is a • sparse vector • close to the span of the large eigenvectors of G
Hypercontractivity
Definition: (Hypercontractivity) A subspace 𝑆 ∈ 𝑅𝑁 is hypercontractive if for all 𝑤 ∈ 𝑆
𝑤 4 ≤ 𝐶 𝑤 2 Projector 𝑃𝑆 in to the subspace 𝑆, also called hypercontractive.
Hypercontractivity implies Small-Set Expansion 𝑃1−𝜖 = projector into span of eigenvectors of 𝐺 with eigenvalue ≥ 1 − 𝜖 𝑃1−𝜖 is hypercontractive No sparse vector in span of top eigenvectors of G No small non-expanding set in 𝐺. (G is a small set expander)
(No-Sparse-Vectors) Roughly, No sparse vectors in a hypercontractive subspace 𝑆 because, w is 𝛿-sparse “⇔” 𝑤 4/ 𝑤 2 > 1/𝛿1/4
Hypercontractivity for Noisy Hypercube
Top eigenfunctions of noisy hypercube are low degree polynomials.
(Hypercontractivity of Low Degree Polynomials) For a degree 𝑑 multilinear polynomial 𝑓 on −1,1 𝑛, 𝑓 4 ≤ 9𝑑 𝑓 2
(Noisy Hypercube is a Small-Set Expander) For constant 𝜖, the noisy hypercube is a small-set expander. Moreover, the noisy hypercube has 𝑁 = 2𝑛 vertices and 𝑛 eigenvalues larger than 1 − 𝜖.
[Barak-Gopalan-Hastad-Meka-Raghavendra-Steurer 2009]
For all small constant 𝛿, There exists a graph (the Short Code Graph) that is a 𝛿 −small set
expander with exp (log𝛽 𝑛) eigenvalues ≥ 1 − 𝜖. , 𝑖. 𝑒. ,
𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑟𝑎𝑛𝑘1−𝜖 𝐺 ≥ exp (log𝛽 𝑁) for some 𝛽 depending on 𝜖.
Short Code Graph
Noise Graph: 𝐻𝜖 Vertices: −1,1 𝑛 Edges: Connect every pair of points in hypercube separated by a Hamming distance of 𝜖𝑛
x 1
-1
Short Code Graph
Has 𝑛 sparse cuts, but 𝑁 = 2𝑛 vertices -- too many vertices!
Idea: Pick a subset of vertices of the long code graph, and their induced subgraph. 1. The dictator cuts still yield 𝑛-sparse cuts 2. The subgraph is a small-set expander!
If we reduce 𝑁 = 2𝑛 to 𝑁 = 𝑛100 then, the number of eigenvalues will be 𝑁1
100 Choice: Reed Muller Codewords of large constant degree.
Sparsity of Dictator Cuts
n dimensional hypercube
Connect every pair of vertices in hypercube separated by Hamming distance of 𝝐𝒏
𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑒𝑑𝑔𝑒𝑠 𝑐𝑢𝑡 𝑏𝑦 𝑓𝑖𝑟𝑠𝑡 𝑑𝑖𝑐𝑡𝑎𝑡𝑜𝑟 = Pr
𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑑𝑔𝑒 (𝑥,𝑦)[ 𝑥, 𝑦 𝑖𝑠 𝑐𝑢𝑡]
= Pr𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑑𝑔𝑒 (𝑥,𝑦)
[𝑥1 ≠ 𝑦1]
= 𝜖
Easy: Pretty much for any reasonable subset of vertices, dictators will be sparse cuts.
1
1
1
1
-1 -1
-1
Preserving Small Set Expansion
Top eigenfunctions of noisy hypercube are low degree polynomials.
(Hypercontractivity of Low Degree Polynomials) For a degree 𝑑 multilinear polynomial 𝑓 on −1,1 𝑛, 𝑓 4 ≤ 9𝑑 𝑓 2
(Noisy Hypercube is a Small-Set Expander) For constant 𝜖, the noisy hypercube is a small-set expander.
+
Preserving Small Set Expansion (Hypercontractivity of Low Degree Polynomials) For a degree 𝑑 multilinear polynomial 𝑓 on −1,1 𝑛, 𝑓 4 ≤ 9𝑑 𝑓 2
For a degree 𝑑 polynomial 𝑓, By hypercontractivity over hypercube,
𝐸𝑥∈ −1,1 𝑛 𝑓 𝑥 4 ≤ 94𝑑 𝐸𝑥∈ −1,1 𝑛 𝑓 𝑥2
We picked a subset 𝑆 ⊂ −1,1 𝑛 and so we want, 𝐸𝑥∈𝑆 𝑓 𝑥 4 ≤ 94𝑑 𝐸𝑥∈𝑆 𝑓 𝑥 2 𝑓 is degree d, so 𝑓4 and 𝑓2 are degree ≤ 4𝑑 . If S is a 4d-wise independent set then,
𝐸𝑥∈𝑆 𝑓 𝑥 4 = 𝐸𝑥∈ −1,1 𝑛 𝑓 𝑥 4
≤ 94𝑑 𝐸𝑥∈ −1,1 𝑛 𝑓 𝑥2
= 94𝑑 𝐸𝑥∈𝑆 𝑓 𝑥 2
Preserving Small Set Expansion
Top eigenfunctions of noisy hypercube are low degree polynomials.
We Want: Only top eigenfunctions on the subgraph of noisy hypercube are also low degree polynomials.
Connected to local-testability of the dual of the underlying code S!
We appeal to local testability result of Reed-Muller codes [Bhattacharya-Kopparty-Schoenebeck-Sudan-Zuckermann]
Applications of Short Code
[Barak-Gopalan-Hastad-Meka-Raghavendra-Steurer 2009]
`Majority is Stablest’ theorem holds for the short code. -- More efficient gadgets for hardness reductions. -- Stronger integrality gaps for SDP relaxations.
Recap: ``How many large eigenvalues can a small set expander have?’’ -- A graph construction led to better hardness gadgets and SDP integrality gaps.
[Kane-Meka]
A 2 log log 𝑛 1/3 SDP gap with triangle inequalities for BALANCED
SEPARATOR
On the Power of Sum-of-Squares Proof
Level-8 SoS relaxation refutes UG instances based on long-code and short-code graphs
Result:
SoS hierarchy is a natural candidate algorithm for refuting UGC
Should try to prove that this algorithm fails on some instances
Only candidate instances were based on long-code or short-code graph
We don’t know any instances on which this algorithm could potentially fail!
Show in this proof system that no assignments for these instances exist
How to prove it? (rounding algorithm?)
Interpret dual as proof system
Level-8 SoS relaxation refutes UG instances based on long-code and short-code graphs
Result:
qualitative difference to other hierarchies: basis independence
Try to lift this proof to the proof system
We already know “regular” proof of this fact! (soundness proof)
Sum-of-Squares Proof System
𝑃1 𝑧 ≥ 0
𝑃𝑚 𝑧 ≥ 0
Axioms
…
derive
𝑄 𝑧 ≤ 𝑐
(informal)
Rules
Polynomial operations “Positivstellensatz” [Stengel’74]
𝑅 𝑧 2 ≥ 0 for any polynomial 𝑅
Intermediate polynomials have bounded degree
(𝑃1, … , 𝑃𝑚, 𝑄 bounded-degree polynomials)
(c.f. bounded-width resolution, but basis independent)
1 − 𝑧 = 𝑧 − 𝑧2 + 1 − 𝑧 2
≥ 𝑧 − 𝑧2
Axiom: 𝑧2 ≤ 𝑧 Derive: 𝑧 ≤ 1
(non-negativity of squares)
≥ 0 (axiom)
Example
Non-serious issues:
Serious issues:
Cauchy–Schwarz / Hölder
Hypercontractivity
Invariance Principle
Influence decoding
can use variant of inductive proof, works in Fourier basis
typically uses bump functions, but for UG, polynomials suffice
Components of soundness proof (for known UG instances)
𝑃 = projector into span of eigenfunctions of 𝐺 with eigenvalue ≥ 𝜆 = 0.1
G = long-code graph Cay(𝔽2𝑚, 𝑇) where 𝑇 = {points with Hamming weight 휀𝑚}
2𝑂 1/ ‖𝑓‖24 − ‖𝑃𝑓‖4
4 is a sum of squares
SoS proof of hypercontractivity:
𝑃 = projector into span of eigenfunctions of 𝐺 with eigenvalue ≥ 𝜆 = 0.1
G = long-code graph Cay(𝔽2𝑚, 𝑇) where 𝑇 = {points with Hamming weight 휀𝑚}
‖𝑃𝑓‖44 ≼ 2𝑂 1/ ‖𝑓‖2
4
For long-code graph, 𝑃 projects into Fourier polynomials with degree 𝑂(1/휀)
Stronger ind. Hyp.:
where 𝑓 is a generic degree-𝑑 Fourier polynomial and 𝑔 is a generic degree-𝑒 Fourier polynomial
𝐄 𝑓2𝑔2 ≼ 3𝑑+𝑒𝐄𝑓2 ⋅ 𝐄𝑔2
difference is sum of squares SoS proof of hypercontractivity:
𝐄𝑓2 = 𝑓 𝑆2
𝑆, 𝑆 ≤𝑑
𝑃 = projector into span of eigenfunctions of 𝐺 with eigenvalue ≥ 𝜆 = 0.1
G = long-code graph Cay(𝔽2𝑚, 𝑇) where 𝑇 = {points with Hamming weight 휀𝑚}
‖𝑃𝑓‖44 ≼ 2𝑂 1/ ‖𝑓‖2
4
For long-code graph, 𝑃 projects into Fourier polynomials with degree 𝑂(1/휀)
Stronger ind. Hyp.:
where 𝑓 is a generic degree-𝑑 Fourier polynomial and 𝑔 is a generic degree-𝑒 Fourier polynomial
𝐄 𝑓2𝑔2 ≼ 3𝑑+𝑒𝐄𝑓2 ⋅ 𝐄𝑔2
Write 𝑓 = 𝑓0 + 𝑥1 ⋅ 𝑓1 and 𝑔 = 𝑔0 + 𝑥1 ⋅ 𝑔1 (degrees of 𝑓1, 𝑔1 smaller than 𝑑, 𝑒)
𝐄 𝑓2𝑔2 = 𝐄 𝑓02𝑔0
2 + 𝐄 𝑓12𝑔0
2 + 𝐄 𝑓02𝑔1
2 + 𝐄 𝑓12𝑔1
2 + 4𝐄 𝑓0𝑓1𝑔0𝑔1
≼ … + 2𝐄 𝑓02𝑔1
2 + 2𝐄 𝑓12𝑔0
2
≼ 3𝑑+𝑒(𝐄𝑓02 + 𝐄𝑓1
2) ⋅ (𝐄𝑔02 + 𝐄𝑔1
2) (ind. hyp.)
SoS proof of hypercontractivity:
Open Questions
Does 8 rounds of Lasserre hierarchy disprove UGC?
Can we make the short code, any shorter? (applications to hardness gadgets)
Subexponential time algorithms for MaxCut or Vertex Cover (beating the current ratios)
Thanks!