Living on the Edge¦
Phase Transitions in Convex Programs with Random Data
Joel A. TroppMichael B. McCoy
Computing + Mathematical Sciences
California Institute of Technology
Joint with Dennis Amelunxen and Martin Lotz (Manchester)
Research supported in part by ONR, AFOSR, DARPA, and the Sloan Foundation 1
Convex Programs with Random Data
Examples...
§ Stat and ML. Random data models; fit model via optimization
§ Sensing. Collect random measurements; reconstruct via optimization
§ Coding. Random channel models; decode via optimization
Motivations...
§ Average-case analysis. Randomness describes “typical” behavior
§ Fundamental bounds. Opportunities and limits for convex methods
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 2
Research Challenge...
Understand and predictprecise behavior
of random convex programs
References: Donoho–Maleki–Montanari 2009, Donoho–Johnstone–Montanari 2011, Donoho–Gavish–Montanari 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 3
A Theory Emerges...
§ Vershik & Sporyshev, “An asymptotic estimate for the average number of steps...” 1986
§ Donoho, “High-dimensional centrally symmetric polytopes...” 2/2005
§ Rudelson & Vershynin, “On sparse reconstruction...” 2/2006
§ Donoho & Tanner, “Counting faces of randomly projected polytopes...” 5/2006
§ Xu & Hassibi, “Compressed sensing over the Grassmann manifold...” 9/2008
§ Stojnic, “Various thresholds for `1 optimization...” 7/2009
§ Bayati & Montanari, “The LASSO risk for gaussian matrices” 8/2010
§ Oymak & Hassibi, “New null space results and recovery thresholds...” 11/2010
§ Chandrasekaran, Recht, et al., “The convex geometry of linear inverse problems” 12/2010
§ McCoy & Tropp, “Sharp recovery bounds for convex demixing...” 5/2012
§ Bayati, Lelarge, & Montanari, “Universality in polytope phase transitions...” 7/2012
§ Chandrasekaran & Jordan, “Computational & statistical tradeoffs...” 10/2012
§ Amelunxen, Lotz, McCoy, & Tropp, “Living on the edge...” 3/2013
§ Stojnic, various works 3/2013
§ Foygel & Mackey, “Corrupted sensing: Novel guarantees...” 5/2013
§ Oymak & Hassibi, “Asymptotically exact denoising...” 5/2013
§ McCoy & Tropp, “From Steiner formulas for cones...” 8/2013
§ McCoy & Tropp, “The achievable performance of convex demixing...” 9/2013
§ Oymak, Thrampoulidis, & Hassibi, “The squared-error of generalized LASSO...” 11/2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 4
The Core Question
How big is a cone?
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 5
.
RegularizedDenoising
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 6
Denoising a Piecewise Smooth Signal
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−4
−3
−2
−1
0
1
2
3
4Piecewise smooth function + additive white noise
Time
Val
ue
Original + noise
Original
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−4
−3
−2
−1
0
1
2
3Denoised piecewise smooth function
Time
Valu
e
By wavelet shrinkageOriginal
§ Observation: z = x\ + σg where g ∼ normal(0, I)
§ Denoise via wavelet shrinkage = convex optimization:
minimize1
2‖x− z‖22 + λ ‖Wx‖1
Reference: Donoho & Johnstone, early 1990s
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 7
Setup for Regularized Denoising
§ Let x\ ∈ Rd be “structured” but unknown
§ Let f : Rd → R be a convex function that reflects “structure”
§ Observe z = x\ + σg where g ∼ normal(0, I)
§ Remove noise by solving the convex program*
minimize1
2‖x− z‖22 subject to f(x) ≤ f(x\)
§ Hope: The minimizer x approximates x\
*We assume the side information f(x\) is available. This is equivalent** to knowing the
optimal choice of Lagrange multiplier for the constraint.
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 8
Geometry of Denoising I
{x : f(x) ≤ f(x\)}
zσg
x
x\
References: Chandrasekaran & Jordan 2012, Oymak & Hassibi 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 9
Descent Cones
Definition. The descent cone of a function f at a point x is
D(f,x) := {h : f(x+ εh) ≤ f(x) for some ε > 0}
{h : f(x+ h) ≤ f(x)}
D(f,x)
0{y : f(y) ≤ f(x)}
x+ D(f,x)
x
References: Rockafellar 1970, Hiriary-Urruty & Lemarechal 1996
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 10
Geometry of Denoising II
{x : f(x) ≤ f(x\)}
x\ +K
zσg
xΠK(σg)
x\
K = D(f,x\)
References: Chandrasekaran & Jordan 2012, Oymak & Hassibi 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 11
The Risk of Regularized Denoising
Theorem 1. [Oymak & Hassibi 2013] Assume
§ We observe z = x\ + σg where g is standard normal
§ The vector x solves
minimize1
2‖z − x‖22 subject to f(x) ≤ f(x\)
Then
supσ>0
E ‖x− x\‖22σ2
= E ‖ΠK(g)‖22
where K = D(f,x\) and ΠK is the Euclidean metric projector onto K.
Related: Bhaskar–Tang–Recht 2012, Donoho–Johnstone–Montanari 2012, Chandrasekaran & Jordan 2012
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 12
.
Statistical.Dimension
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 13
Statistical Dimension: The Motion Picture
K
g
ΠK(g)
0
small cone
K
g
ΠK(g)
0
big cone
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 14
The Statistical Dimension of a Cone
Definition. The statistical dimension δ(K) of a closed,
convex cone K is the quantity
δ(K) := E[‖ΠK(g)‖22
].
where
§ ΠK is the Euclidean metric projector onto K
§ g ∼ normal(0, I) is a standard normal vector
References: Rudelson & Vershynin 2006, Stojnic 2009, Chandrasekaran et al. 2010, Chandrasekaran & Jordan 2012, Amelunxen
et al. 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 15
Basic Statistical Dimension Calculations
Cone Notation Statistical Dimension
j-dim subspace Lj j
Nonnegative orthant Rd+ 12d
Second-order cone Ld+1 12(d+ 1)
Real psd cone Sd+ 14d(d− 1)
Complex psd cone Hd+ 12d
2
References: Chandrasekaran et al. 2010, Amelunxen et al. 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 16
Circular Cones
00
1/4
1/2
3/4
1
1
References: Amelunxen et al. 2013, Mu et al. 2013, McCoy & Tropp 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 17
Descent Cone of `1 Norm at Sparse Vector
0 1/4 1/2 3/4 10
1/4
1/2
3/4
1
1
References: Stojnic 2009, Donoho & Tanner 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Mackey & Foygel 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 18
Descent Cone of S1 Norm at Low-Rank Matrix
0 1/4 1/2 3/4 1
1/4
1/2
3/4
1
0
References: Oymak & Hassibi 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Foygel & Mackey 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 19
.
Regularized LinearInverse Problems
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 20
Example: The Random Demodulator
Time (μ s)
Freq
uenc
y (M
Hz)
40.08 80.16 120.23 160.31 200.39
0.01
0.02
0.04
0.05
0.06
0.07
Time (μ s)
Freq
uenc
y (M
Hz)
40.08 80.16 120.23 160.31 200.39
0.01
0.02
0.04
0.05
0.06
0.07
PseudorandomNumberGenerator
Seed
Input to sensor Reconstruct(e.g., convex optimization)
Linear data acquisition system Output from sensor
Reference: Tropp et al. 2010, ...
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 21
Setup for Linear Inverse Problems
§ Let x\ ∈ Rd be a structured, unknown vector
§ Let f : Rd → R be a convex function that reflects structure
§ Let A ∈ Rm×d be a measurement operator
§ Observe z = Ax\
§ Find estimate x by solving convex program
minimize f(x) subject to Ax = z
§ Hope: x = x\
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 22
Geometry of Linear Inverse Problems
x\ + null(A)
{x : f(x) ≤ f(x\)}x\
x\ + D(f,x\)
Success!
x\ + null(A)
{x : f(x) ≤ f(x\)}x\
x\ + D(f,x\)
Failure!
References: Candes–Romberg–Tao 2005, Rudelson–Vershynin 2006, Chandrasekaran et al. 2010, Amelunxen et al. 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 23
Linear Inverse Problems with Random Data
Theorem 2. [CRPW10; ALMT13] Assume
§ The vector x\ ∈ Rd is unknown
§ The observation z = Ax\ where A ∈ Rm×d is standard normal
§ The vector x solves
minimize f(x) subject to Ax = z
Then
m & δ(D(f,x\)
)=⇒ x = x\ whp [CRPW10; ALMT13]
m . δ(D(f,x\)
)=⇒ x 6= x\ whp [ALMT13]
References: Stojnic 2009, Chandrasekaran et al. 2010, Amelunxen et al. 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 24
Sparse Recovery via `1 Minimization
0 25 50 75 1000
25
50
75
100
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 25
Low-Rank Recovery via S1 Minimization
0 10 20 300
300
600
900
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 26
.
DemixingStructured Signals
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 27
Example: Demixing Spikes + Sines
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18!1
!0.5
0
0.5
1
Time (seconds)
axis([A(1:2),1,1]);
Uy0: Signal with sparse DCT
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18!1
!0.5
0
0.5
1
x0: Sparse noise
Am
plitu
deA
mpl
itude
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18−1
−0.5
0
0.5
1 Noisy Signal
Am
plitu
de
Time (seconds)
Observation: z = x\ +Uy\ where U is the DCT
References: Starck–Donoho–Candes 2002, Starck–Elad–Donoho 2004
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 28
Convex Demixing Yields...
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18−1
−0.5
0
0.5
1
Time (seconds)
DemixedOriginal
Am
plitu
de
minimize ‖x‖1 + λ ‖y‖1subject to z = x+Uy
References: Starck–Donoho–Candes 2002, Starck–Elad–Donoho 2004
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 29
Setup for Demixing Problems
§ Let x\ ∈ Rd and y\ ∈ Rd be structured, unknown vectors
§ Let f, g : Rd → R be convex functions that reflect structure
§ Let U ∈ Rd×d be a known orthogonal matrix
§ Observe z = x\ +Uy\
§ Demix via convex program
minimize f(x) subject to g(y) ≤ g(y\)
x+Uy = z
§ Hope: (x, y) = (x\,y\)
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 30
Geometry of Demixing Problems
x\
x\ + D(f,x\)
x\ −UD(g,y\)
Success!
x\
x\ + D(f,x\)
x\ −UD(g,y\)
Failure!
References: McCoy & Tropp 2012, Amelunxen et al. 2013, McCoy & Tropp 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 31
Demixing Problems with Random Incoherence
Theorem 3. [MT12, ALMT13] Assume
§ The vectors x\ ∈ Rd and y\ ∈ Rd are unknown
§ The observation z = x\ +Qy\ where Q is random orthogonal
§ The pair (x, y) solves
minimize f(x) subject to g(y) ≤ g(y\)
x+Qy = z
Then
δ(D(f,x\)
)+ δ(D(g,y\)
). d =⇒ (x, y) = (x\,y\) whp
δ(D(f,x\)
)+ δ(D(g,y\)
)& d =⇒ (x, y) 6= (x\,y\) whp
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 32
Sparse + Sparse via `1 + `1 Minimization
0 25 50 75 1000
25
50
75
100
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 33
Low-Rank + Sparse via S1 + `1 Minimization
0 7 14 21 28 350
245
490
735
980
1225
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 34
Statistical Dimension & Phase Transitions
§ Key Question: When do two randomly oriented cones strike?
§ Intuition: When do two randomly oriented subspaces strike?
The Approximate Kinematic Formula
Let C and K be closed convex cones in Rd
δ(C) + δ(K) . d =⇒ P {C ∩QK = {0}} ≈ 1
δ(C) + δ(K) & d =⇒ P {C ∩QK = {0}} ≈ 0
where Q is a random orthogonal matrix
References: Amelunxen et al. 2013, McCoy & Tropp 2013
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 35
To learn more...
E-mail: [email protected]
Web: http://users.cms.caltech.edu/~jtropp
Main Papers Discussed:
§ Chandrasekaran, Recht, et al., “The convex geometry of linear inverse problems.” FOCM 2012§ MT, “Sharp recovery bounds for convex deconvolution, with applications.” FOCM 2014§ ALMT, “Living on the edge: A geometric theory of phase transitions in convex optimization.” I&I 2014§ Oymak & Hassibi, “Asymptotically exact denoising in relation to compressed sensing,” arXiv cs.IT
1305.2714§ MT, “From Steiner formulas for cones to concentration of intrinsic volumes.” DCG 2014§ MT, “The achievable performance of convex demixing,” arXiv cs.IT 1309.7478§ More to come!
Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 36