Living on the Edge - California Institute of jtropp/slides/Tro14-Living-Edge-Talk.pdfLiving on the Edge…

Download Living on the Edge - California Institute of jtropp/slides/Tro14-Living-Edge-Talk.pdfLiving on the Edge…

Post on 11-Jun-2018

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>Living on the Edge</p><p>Phase Transitions in Convex Programs with Random Data</p><p>Joel A. TroppMichael B. McCoy</p><p>Computing + Mathematical Sciences</p><p>California Institute of Technology</p><p>Joint with Dennis Amelunxen and Martin Lotz (Manchester)</p><p>Research supported in part by ONR, AFOSR, DARPA, and the Sloan Foundation 1</p></li><li><p>Convex Programs with Random Data</p><p>Examples...</p><p> Stat and ML. Random data models; fit model via optimization</p><p> Sensing. Collect random measurements; reconstruct via optimization</p><p> Coding. Random channel models; decode via optimization</p><p>Motivations...</p><p> Average-case analysis. Randomness describes typical behavior</p><p> Fundamental bounds. Opportunities and limits for convex methods</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 2</p></li><li><p>Research Challenge...</p><p>Understand and predictprecise behavior</p><p>of random convex programs</p><p>References: DonohoMalekiMontanari 2009, DonohoJohnstoneMontanari 2011, DonohoGavishMontanari 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 3</p></li><li><p>A Theory Emerges...</p><p> Vershik &amp; Sporyshev, An asymptotic estimate for the average number of steps... 1986</p><p> Donoho, High-dimensional centrally symmetric polytopes... 2/2005</p><p> Rudelson &amp; Vershynin, On sparse reconstruction... 2/2006</p><p> Donoho &amp; Tanner, Counting faces of randomly projected polytopes... 5/2006</p><p> Xu &amp; Hassibi, Compressed sensing over the Grassmann manifold... 9/2008</p><p> Stojnic, Various thresholds for `1 optimization... 7/2009</p><p> Bayati &amp; Montanari, The LASSO risk for gaussian matrices 8/2010</p><p> Oymak &amp; Hassibi, New null space results and recovery thresholds... 11/2010</p><p> Chandrasekaran, Recht, et al., The convex geometry of linear inverse problems 12/2010</p><p> McCoy &amp; Tropp, Sharp recovery bounds for convex demixing... 5/2012</p><p> Bayati, Lelarge, &amp; Montanari, Universality in polytope phase transitions... 7/2012</p><p> Chandrasekaran &amp; Jordan, Computational &amp; statistical tradeoffs... 10/2012</p><p> Amelunxen, Lotz, McCoy, &amp; Tropp, Living on the edge... 3/2013</p><p> Stojnic, various works 3/2013</p><p> Foygel &amp; Mackey, Corrupted sensing: Novel guarantees... 5/2013</p><p> Oymak &amp; Hassibi, Asymptotically exact denoising... 5/2013</p><p> McCoy &amp; Tropp, From Steiner formulas for cones... 8/2013</p><p> McCoy &amp; Tropp, The achievable performance of convex demixing... 9/2013</p><p> Oymak, Thrampoulidis, &amp; Hassibi, The squared-error of generalized LASSO... 11/2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 4</p></li><li><p>The Core Question</p><p>How big is a cone?</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 5</p></li><li><p>.</p><p>RegularizedDenoising</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 6</p></li><li><p>Denoising a Piecewise Smooth Signal</p><p>0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 14</p><p>3</p><p>2</p><p>1</p><p>0</p><p>1</p><p>2</p><p>3</p><p>4Piecewise smooth function + additive white noise</p><p>Time</p><p>Val</p><p>ue</p><p>Original + noise</p><p>Original</p><p>0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 14</p><p>3</p><p>2</p><p>1</p><p>0</p><p>1</p><p>2</p><p>3Denoised piecewise smooth function</p><p>Time</p><p>Valu</p><p>e</p><p>By wavelet shrinkageOriginal</p><p> Observation: z = x\ + g where g normal(0, I) Denoise via wavelet shrinkage = convex optimization:</p><p>minimize1</p><p>2x z22 + Wx1</p><p>Reference: Donoho &amp; Johnstone, early 1990s</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 7</p></li><li><p>Setup for Regularized Denoising</p><p> Let x\ Rd be structured but unknown</p><p> Let f : Rd R be a convex function that reflects structure</p><p> Observe z = x\ + g where g normal(0, I)</p><p> Remove noise by solving the convex program*</p><p>minimize1</p><p>2x z22 subject to f(x) f(x</p><p>\)</p><p> Hope: The minimizer x approximates x\</p><p>*We assume the side information f(x\) is available. This is equivalent** to knowing the</p><p>optimal choice of Lagrange multiplier for the constraint.</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 8</p></li><li><p>Geometry of Denoising I</p><p>{x : f(x) f(x\)}</p><p>zg</p><p>x</p><p>x\</p><p>References: Chandrasekaran &amp; Jordan 2012, Oymak &amp; Hassibi 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 9</p></li><li><p>Descent Cones</p><p>Definition. The descent cone of a function f at a point x is</p><p>D(f,x) := {h : f(x+ h) f(x) for some &gt; 0}</p><p>{h : f(x+ h) f(x)}</p><p>D(f,x)</p><p>0{y : f(y) f(x)}</p><p>x+ D(f,x)</p><p>x</p><p>References: Rockafellar 1970, Hiriary-Urruty &amp; Lemarechal 1996</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 10</p></li><li><p>Geometry of Denoising II</p><p>{x : f(x) f(x\)}</p><p>x\ +K</p><p>zg</p><p>xK(g)</p><p>x\</p><p>K = D(f,x\)</p><p>References: Chandrasekaran &amp; Jordan 2012, Oymak &amp; Hassibi 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 11</p></li><li><p>The Risk of Regularized Denoising</p><p>Theorem 1. [Oymak &amp; Hassibi 2013] Assume</p><p> We observe z = x\ + g where g is standard normal</p><p> The vector x solves</p><p>minimize1</p><p>2z x22 subject to f(x) f(x</p><p>\)</p><p>Then</p><p>sup&gt;0</p><p>E x x\222</p><p>= E K(g)22</p><p>where K = D(f,x\) and K is the Euclidean metric projector onto K.</p><p>Related: BhaskarTangRecht 2012, DonohoJohnstoneMontanari 2012, Chandrasekaran &amp; Jordan 2012</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 12</p></li><li><p>.</p><p>Statistical.Dimension</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 13</p></li><li><p>Statistical Dimension: The Motion Picture</p><p>K</p><p>g</p><p>K(g)</p><p>0</p><p>small cone</p><p>K</p><p>g</p><p>K(g)</p><p>0</p><p>big cone</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 14</p></li><li><p>The Statistical Dimension of a Cone</p><p>Definition. The statistical dimension (K) of a closed,</p><p>convex cone K is the quantity</p><p>(K) := E[K(g)22</p><p>].</p><p>where</p><p> K is the Euclidean metric projector onto K</p><p> g normal(0, I) is a standard normal vector</p><p>References: Rudelson &amp; Vershynin 2006, Stojnic 2009, Chandrasekaran et al. 2010, Chandrasekaran &amp; Jordan 2012, Amelunxen</p><p>et al. 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 15</p></li><li><p>Basic Statistical Dimension Calculations</p><p>Cone Notation Statistical Dimension</p><p>j-dim subspace Lj j</p><p>Nonnegative orthant Rd+ 12d</p><p>Second-order cone Ld+1 12(d+ 1)</p><p>Real psd cone Sd+ 14d(d 1)</p><p>Complex psd cone Hd+ 12d2</p><p>References: Chandrasekaran et al. 2010, Amelunxen et al. 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 16</p></li><li><p>Circular Cones</p><p>00</p><p>1/4</p><p>1/2</p><p>3/4</p><p>1</p><p>1</p><p>References: Amelunxen et al. 2013, Mu et al. 2013, McCoy &amp; Tropp 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 17</p></li><li><p>Descent Cone of `1 Norm at Sparse Vector</p><p>0 1/4 1/2 3/4 10</p><p>1/4</p><p>1/2</p><p>3/4</p><p>1</p><p>1</p><p>References: Stojnic 2009, Donoho &amp; Tanner 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Mackey &amp; Foygel 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 18</p></li><li><p>Descent Cone of S1 Norm at Low-Rank Matrix</p><p>0 1/4 1/2 3/4 1</p><p>1/4</p><p>1/2</p><p>3/4</p><p>1</p><p>0</p><p>References: Oymak &amp; Hassibi 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Foygel &amp; Mackey 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 19</p></li><li><p>.</p><p>Regularized LinearInverse Problems</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 20</p></li><li><p>Example: The Random Demodulator</p><p>Time ( s)</p><p>Freq</p><p>uenc</p><p>y (M</p><p>Hz)</p><p>40.08 80.16 120.23 160.31 200.39</p><p>0.01</p><p>0.02</p><p>0.04</p><p>0.05</p><p>0.06</p><p>0.07</p><p>Time ( s)</p><p>Freq</p><p>uenc</p><p>y (M</p><p>Hz)</p><p>40.08 80.16 120.23 160.31 200.39</p><p>0.01</p><p>0.02</p><p>0.04</p><p>0.05</p><p>0.06</p><p>0.07</p><p>PseudorandomNumberGenerator</p><p>Seed</p><p>Input to sensor Reconstruct(e.g., convex optimization)</p><p>Linear data acquisition system Output from sensor</p><p>Reference: Tropp et al. 2010, ...</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 21</p></li><li><p>Setup for Linear Inverse Problems</p><p> Let x\ Rd be a structured, unknown vector</p><p> Let f : Rd R be a convex function that reflects structure</p><p> Let A Rmd be a measurement operator</p><p> Observe z = Ax\</p><p> Find estimate x by solving convex program</p><p>minimize f(x) subject to Ax = z</p><p> Hope: x = x\</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 22</p></li><li><p>Geometry of Linear Inverse Problems</p><p>x\ + null(A)</p><p>{x : f(x) f(x\)}x\</p><p>x\ + D(f,x\)</p><p>Success!</p><p>x\ + null(A)</p><p>{x : f(x) f(x\)}x\</p><p>x\ + D(f,x\)</p><p>Failure!</p><p>References: CandesRombergTao 2005, RudelsonVershynin 2006, Chandrasekaran et al. 2010, Amelunxen et al. 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 23</p></li><li><p>Linear Inverse Problems with Random Data</p><p>Theorem 2. [CRPW10; ALMT13] Assume</p><p> The vector x\ Rd is unknown</p><p> The observation z = Ax\ where A Rmd is standard normal</p><p> The vector x solves</p><p>minimize f(x) subject to Ax = z</p><p>Then</p><p>m &amp; (D(f,x\)</p><p>)= x = x\ whp [CRPW10; ALMT13]</p><p>m . (D(f,x\)</p><p>)= x 6= x\ whp [ALMT13]</p><p>References: Stojnic 2009, Chandrasekaran et al. 2010, Amelunxen et al. 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 24</p></li><li><p>Sparse Recovery via `1 Minimization</p><p>0 25 50 75 1000</p><p>25</p><p>50</p><p>75</p><p>100</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 25</p></li><li><p>Low-Rank Recovery via S1 Minimization</p><p>0 10 20 300</p><p>300</p><p>600</p><p>900</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 26</p></li><li><p>.</p><p>DemixingStructured Signals</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 27</p></li><li><p>Example: Demixing Spikes + Sines</p><p>0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.181</p><p>0.5</p><p>0</p><p>0.5</p><p>1</p><p>Time (seconds)</p><p>axis([A(1:2),1,1]);</p><p>Uy0: Signal with sparse DCT</p><p>0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.181</p><p>0.5</p><p>0</p><p>0.5</p><p>1</p><p>x0: Sparse noise</p><p>Am</p><p>plitu</p><p>deA</p><p>mpl</p><p>itude</p><p>0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.181</p><p>0.5</p><p>0</p><p>0.5</p><p>1 Noisy Signal</p><p>Am</p><p>plitu</p><p>de</p><p>Time (seconds)</p><p>Observation: z = x\ +Uy\ where U is the DCT</p><p>References: StarckDonohoCandes 2002, StarckEladDonoho 2004</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 28</p></li><li><p>Convex Demixing Yields...</p><p>0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.181</p><p>0.5</p><p>0</p><p>0.5</p><p>1</p><p>Time (seconds)</p><p> DemixedOriginal</p><p>Am</p><p>plitu</p><p>de</p><p>minimize x1 + y1subject to z = x+Uy</p><p>References: StarckDonohoCandes 2002, StarckEladDonoho 2004</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 29</p></li><li><p>Setup for Demixing Problems</p><p> Let x\ Rd and y\ Rd be structured, unknown vectors</p><p> Let f, g : Rd R be convex functions that reflect structure</p><p> Let U Rdd be a known orthogonal matrix</p><p> Observe z = x\ +Uy\</p><p> Demix via convex program</p><p>minimize f(x) subject to g(y) g(y\)</p><p>x+Uy = z</p><p> Hope: (x, y) = (x\,y\)</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 30</p></li><li><p>Geometry of Demixing Problems</p><p>x\</p><p>x\ + D(f,x\)</p><p>x\ UD(g,y\)</p><p>Success!</p><p>x\</p><p>x\ + D(f,x\)</p><p>x\ UD(g,y\)</p><p>Failure!</p><p>References: McCoy &amp; Tropp 2012, Amelunxen et al. 2013, McCoy &amp; Tropp 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 31</p></li><li><p>Demixing Problems with Random Incoherence</p><p>Theorem 3. [MT12, ALMT13] Assume</p><p> The vectors x\ Rd and y\ Rd are unknown</p><p> The observation z = x\ +Qy\ where Q is random orthogonal</p><p> The pair (x, y) solves</p><p>minimize f(x) subject to g(y) g(y\)</p><p>x+Qy = z</p><p>Then</p><p>(D(f,x\)</p><p>)+ (D(g,y\)</p><p>). d = (x, y) = (x\,y\) whp</p><p>(D(f,x\)</p><p>)+ (D(g,y\)</p><p>)&amp; d = (x, y) 6= (x\,y\) whp</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 32</p></li><li><p>Sparse + Sparse via `1 + `1 Minimization</p><p>0 25 50 75 1000</p><p>25</p><p>50</p><p>75</p><p>100</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 33</p></li><li><p>Low-Rank + Sparse via S1 + `1 Minimization</p><p>0 7 14 21 28 350</p><p>245</p><p>490</p><p>735</p><p>980</p><p>1225</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 34</p></li><li><p>Statistical Dimension &amp; Phase Transitions</p><p> Key Question: When do two randomly oriented cones strike?</p><p> Intuition: When do two randomly oriented subspaces strike?</p><p>The Approximate Kinematic Formula</p><p>Let C and K be closed convex cones in Rd</p><p>(C) + (K) . d = P {C QK = {0}} 1</p><p>(C) + (K) &amp; d = P {C QK = {0}} 0</p><p>where Q is a random orthogonal matrix</p><p>References: Amelunxen et al. 2013, McCoy &amp; Tropp 2013</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 35</p></li><li><p>To learn more...</p><p>E-mail: jtropp@cms.caltech.edu</p><p>Web: http://users.cms.caltech.edu/~jtropp</p><p>Main Papers Discussed:</p><p> Chandrasekaran, Recht, et al., The convex geometry of linear inverse problems. FOCM 2012 MT, Sharp recovery bounds for convex deconvolution, with applications. FOCM 2014 ALMT, Living on the edge: A geometric theory of phase transitions in convex optimization. I&amp;I 2014 Oymak &amp; Hassibi, Asymptotically exact denoising in relation to compressed sensing, arXiv cs.IT</p><p>1305.2714 MT, From Steiner formulas for cones to concentration of intrinsic volumes. DCG 2014 MT, The achievable performance of convex demixing, arXiv cs.IT 1309.7478 More to come!</p><p>Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 36</p></li></ul>