Download - Living on the Edge - California Institute of Technologyusers.cms.caltech.edu/~jtropp/slides/Tro14-Living-Edge-Talk.pdfLiving on the Edge ƒ Phase Transitions in Convex Programs with

Living on the Edge¦

Phase Transitions in Convex Programs with Random Data

Joel A. TroppMichael B. McCoy

Computing + Mathematical Sciences

California Institute of Technology

Joint with Dennis Amelunxen and Martin Lotz (Manchester)

Research supported in part by ONR, AFOSR, DARPA, and the Sloan Foundation 1

Convex Programs with Random Data

Examples...

§ Stat and ML. Random data models; fit model via optimization

§ Sensing. Collect random measurements; reconstruct via optimization

§ Coding. Random channel models; decode via optimization

Motivations...

§ Average-case analysis. Randomness describes “typical” behavior

§ Fundamental bounds. Opportunities and limits for convex methods

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 2

Research Challenge...

Understand and predictprecise behavior

of random convex programs

References: Donoho–Maleki–Montanari 2009, Donoho–Johnstone–Montanari 2011, Donoho–Gavish–Montanari 2013


A Theory Emerges...

§ Vershik & Sporyshev, “An asymptotic estimate for the average number of steps...” 1986

§ Donoho, “High-dimensional centrally symmetric polytopes...” 2/2005

§ Rudelson & Vershynin, “On sparse reconstruction...” 2/2006

§ Donoho & Tanner, “Counting faces of randomly projected polytopes...” 5/2006

§ Xu & Hassibi, “Compressed sensing over the Grassmann manifold...” 9/2008

§ Stojnic, “Various thresholds for `1 optimization...” 7/2009

§ Bayati & Montanari, “The LASSO risk for gaussian matrices” 8/2010

§ Oymak & Hassibi, “New null space results and recovery thresholds...” 11/2010

§ Chandrasekaran, Recht, et al., “The convex geometry of linear inverse problems” 12/2010

§ McCoy & Tropp, “Sharp recovery bounds for convex demixing...” 5/2012

§ Bayati, Lelarge, & Montanari, “Universality in polytope phase transitions...” 7/2012

§ Chandrasekaran & Jordan, “Computational & statistical tradeoffs...” 10/2012

§ Amelunxen, Lotz, McCoy, & Tropp, “Living on the edge...” 3/2013

§ Stojnic, various works 3/2013

§ Foygel & Mackey, “Corrupted sensing: Novel guarantees...” 5/2013

§ Oymak & Hassibi, “Asymptotically exact denoising...” 5/2013

§ McCoy & Tropp, “From Steiner formulas for cones...” 8/2013

§ McCoy & Tropp, “The achievable performance of convex demixing...” 9/2013

§ Oymak, Thrampoulidis, & Hassibi, “The squared-error of generalized LASSO...” 11/2013


The Core Question

How big is a cone?


.

RegularizedDenoising


Denoising a Piecewise Smooth Signal

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−4

−3

−2

−1

0

1

2

3

4Piecewise smooth function + additive white noise

Time

Val

ue

Original + noise

Original

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−4

−3

−2

−1

0

1

2

3Denoised piecewise smooth function

Time

Valu

e

By wavelet shrinkageOriginal

§ Observation: z = x\ + σg where g ∼ normal(0, I)

§ Denoise via wavelet shrinkage = convex optimization:

minimize1

2‖x− z‖22 + λ ‖Wx‖1

Reference: Donoho & Johnstone, early 1990s


Setup for Regularized Denoising

§ Let x\ ∈ Rd be “structured” but unknown

§ Let f : Rd → R be a convex function that reflects “structure”

§ Observe z = x\ + σg where g ∼ normal(0, I)

§ Remove noise by solving the convex program*

minimize1

2‖x− z‖22 subject to f(x) ≤ f(x\)

§ Hope: The minimizer x approximates x\

*We assume the side information f(x\) is available. This is equivalent** to knowing the

optimal choice of Lagrange multiplier for the constraint.


Geometry of Denoising I

{x : f(x) ≤ f(x\)}

zσg

x

x\

References: Chandrasekaran & Jordan 2012, Oymak & Hassibi 2013


Descent Cones

Definition. The descent cone of a function f at a point x is

D(f,x) := {h : f(x+ εh) ≤ f(x) for some ε > 0}

{h : f(x+ h) ≤ f(x)}

D(f,x)

0{y : f(y) ≤ f(x)}

x+ D(f,x)

x

References: Rockafellar 1970, Hiriary-Urruty & Lemarechal 1996


Geometry of Denoising II

{x : f(x) ≤ f(x\)}

x\ +K

zσg

xΠK(σg)

x\

K = D(f,x\)

References: Chandrasekaran & Jordan 2012, Oymak & Hassibi 2013


The Risk of Regularized Denoising

Theorem 1. [Oymak & Hassibi 2013] Assume

§ We observe z = x\ + σg where g is standard normal

§ The vector x solves

minimize1

2‖z − x‖22 subject to f(x) ≤ f(x\)

Then

supσ>0

E ‖x− x\‖22σ2

= E ‖ΠK(g)‖22

where K = D(f,x\) and ΠK is the Euclidean metric projector onto K.

Related: Bhaskar–Tang–Recht 2012, Donoho–Johnstone–Montanari 2012, Chandrasekaran & Jordan 2012


.

Statistical.Dimension


Statistical Dimension: The Motion Picture

K

g

ΠK(g)

0

small cone

K

g

ΠK(g)

0

big cone


The Statistical Dimension of a Cone

Definition. The statistical dimension δ(K) of a closed,

convex cone K is the quantity

δ(K) := E[‖ΠK(g)‖22

].

where

§ ΠK is the Euclidean metric projector onto K

§ g ∼ normal(0, I) is a standard normal vector

References: Rudelson & Vershynin 2006, Stojnic 2009, Chandrasekaran et al. 2010, Chandrasekaran & Jordan 2012, Amelunxen

et al. 2013


Basic Statistical Dimension Calculations

Cone Notation Statistical Dimension

j-dim subspace Lj j

Nonnegative orthant Rd+ 12d

Second-order cone Ld+1 12(d+ 1)

Real psd cone Sd+ 14d(d− 1)

Complex psd cone Hd+ 12d

2

References: Chandrasekaran et al. 2010, Amelunxen et al. 2013


Circular Cones

00

1/4

1/2

3/4

1

1

References: Amelunxen et al. 2013, Mu et al. 2013, McCoy & Tropp 2013


Descent Cone of `1 Norm at Sparse Vector

0 1/4 1/2 3/4 10

1/4

1/2

3/4

1

1

References: Stojnic 2009, Donoho & Tanner 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Mackey & Foygel 2013


Descent Cone of S1 Norm at Low-Rank Matrix

0 1/4 1/2 3/4 1

1/4

1/2

3/4

1

0

References: Oymak & Hassibi 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Foygel & Mackey 2013


.

Regularized LinearInverse Problems


Example: The Random Demodulator

Time (μ s)

Freq

uenc

y (M

Hz)

40.08 80.16 120.23 160.31 200.39

0.01

0.02

0.04

0.05

0.06

0.07

Time (μ s)

Freq

uenc

y (M

Hz)

40.08 80.16 120.23 160.31 200.39

0.01

0.02

0.04

0.05

0.06

0.07

PseudorandomNumberGenerator

Seed

Input to sensor Reconstruct(e.g., convex optimization)

Linear data acquisition system Output from sensor

Reference: Tropp et al. 2010, ...


Setup for Linear Inverse Problems

§ Let x\ ∈ Rd be a structured, unknown vector

§ Let f : Rd → R be a convex function that reflects structure

§ Let A ∈ Rm×d be a measurement operator

§ Observe z = Ax\

§ Find estimate x by solving convex program

minimize f(x) subject to Ax = z

§ Hope: x = x\


Geometry of Linear Inverse Problems

x\ + null(A)

{x : f(x) ≤ f(x\)}x\

x\ + D(f,x\)

Success!

x\ + null(A)

{x : f(x) ≤ f(x\)}x\

x\ + D(f,x\)

Failure!

References: Candes–Romberg–Tao 2005, Rudelson–Vershynin 2006, Chandrasekaran et al. 2010, Amelunxen et al. 2013


Linear Inverse Problems with Random Data

Theorem 2. [CRPW10; ALMT13] Assume

§ The vector x\ ∈ Rd is unknown

§ The observation z = Ax\ where A ∈ Rm×d is standard normal

§ The vector x solves

minimize f(x) subject to Ax = z

Then

m & δ(D(f,x\)

)=⇒ x = x\ whp [CRPW10; ALMT13]

m . δ(D(f,x\)

)=⇒ x 6= x\ whp [ALMT13]

References: Stojnic 2009, Chandrasekaran et al. 2010, Amelunxen et al. 2013


Sparse Recovery via `1 Minimization

0 25 50 75 1000

25

50

75

100


Low-Rank Recovery via S1 Minimization

0 10 20 300

300

600

900


.

DemixingStructured Signals


Example: Demixing Spikes + Sines

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18!1

!0.5

0

0.5

1

Time (seconds)

axis([A(1:2),1,1]);

Uy0: Signal with sparse DCT

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18!1

!0.5

0

0.5

1

x0: Sparse noise

Am

plitu

deA

mpl

itude

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18−1

−0.5

0

0.5

1 Noisy Signal

Am

plitu

de

Time (seconds)

Observation: z = x\ +Uy\ where U is the DCT

References: Starck–Donoho–Candes 2002, Starck–Elad–Donoho 2004


Convex Demixing Yields...

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18−1

−0.5

0

0.5

1

Time (seconds)

DemixedOriginal

Am

plitu

de

minimize ‖x‖1 + λ ‖y‖1subject to z = x+Uy

References: Starck–Donoho–Candes 2002, Starck–Elad–Donoho 2004


Setup for Demixing Problems

§ Let x\ ∈ Rd and y\ ∈ Rd be structured, unknown vectors

§ Let f, g : Rd → R be convex functions that reflect structure

§ Let U ∈ Rd×d be a known orthogonal matrix

§ Observe z = x\ +Uy\

§ Demix via convex program

minimize f(x) subject to g(y) ≤ g(y\)

x+Uy = z

§ Hope: (x, y) = (x\,y\)


Geometry of Demixing Problems

x\

x\ + D(f,x\)

x\ −UD(g,y\)

Success!

x\

x\ + D(f,x\)

x\ −UD(g,y\)

Failure!

References: McCoy & Tropp 2012, Amelunxen et al. 2013, McCoy & Tropp 2013


Demixing Problems with Random Incoherence

Theorem 3. [MT12, ALMT13] Assume

§ The vectors x\ ∈ Rd and y\ ∈ Rd are unknown

§ The observation z = x\ +Qy\ where Q is random orthogonal

§ The pair (x, y) solves

minimize f(x) subject to g(y) ≤ g(y\)

x+Qy = z

Then

δ(D(f,x\)

)+ δ(D(g,y\)

). d =⇒ (x, y) = (x\,y\) whp

δ(D(f,x\)

)+ δ(D(g,y\)

)& d =⇒ (x, y) 6= (x\,y\) whp


Sparse + Sparse via `1 + `1 Minimization

0 25 50 75 1000

25

50

75

100


Low-Rank + Sparse via S1 + `1 Minimization

0 7 14 21 28 350

245

490

735

980

1225


Statistical Dimension & Phase Transitions

§ Key Question: When do two randomly oriented cones strike?

§ Intuition: When do two randomly oriented subspaces strike?

The Approximate Kinematic Formula

Let C and K be closed convex cones in Rd

δ(C) + δ(K) . d =⇒ P {C ∩QK = {0}} ≈ 1

δ(C) + δ(K) & d =⇒ P {C ∩QK = {0}} ≈ 0

where Q is a random orthogonal matrix

References: Amelunxen et al. 2013, McCoy & Tropp 2013


To learn more...

E-mail: [email protected]

Web: http://users.cms.caltech.edu/~jtropp

Main Papers Discussed:

§ Chandrasekaran, Recht, et al., “The convex geometry of linear inverse problems.” FOCM 2012§ MT, “Sharp recovery bounds for convex deconvolution, with applications.” FOCM 2014§ ALMT, “Living on the edge: A geometric theory of phase transitions in convex optimization.” I&I 2014§ Oymak & Hassibi, “Asymptotically exact denoising in relation to compressed sensing,” arXiv cs.IT

1305.2714§ MT, “From Steiner formulas for cones to concentration of intrinsic volumes.” DCG 2014§ MT, “The achievable performance of convex demixing,” arXiv cs.IT 1309.7478§ More to come!


Download - Living on the Edge - California Institute of Technologyusers.cms.caltech.edu/~jtropp/slides/Tro14-Living-Edge-Talk.pdfLiving on the Edge ƒ Phase Transitions in Convex Programs with

Top Related