Living on the Edge - California Institute of jtropp/slides/Tro14-Living-Edge-Talk.pdfLiving on the Edge

Download Living on the Edge - California Institute of jtropp/slides/Tro14-Living-Edge-Talk.pdfLiving on the Edge

Post on 11-Jun-2018

212 views

Category:

Documents

0 download

TRANSCRIPT

Living on the EdgePhase Transitions in Convex Programs with Random DataJoel A. TroppMichael B. McCoyComputing + Mathematical SciencesCalifornia Institute of TechnologyJoint with Dennis Amelunxen and Martin Lotz (Manchester)Research supported in part by ONR, AFOSR, DARPA, and the Sloan Foundation 1Convex Programs with Random DataExamples... Stat and ML. Random data models; fit model via optimization Sensing. Collect random measurements; reconstruct via optimization Coding. Random channel models; decode via optimizationMotivations... Average-case analysis. Randomness describes typical behavior Fundamental bounds. Opportunities and limits for convex methodsLiving on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 2Research Challenge...Understand and predictprecise behaviorof random convex programsReferences: DonohoMalekiMontanari 2009, DonohoJohnstoneMontanari 2011, DonohoGavishMontanari 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 3A Theory Emerges... Vershik & Sporyshev, An asymptotic estimate for the average number of steps... 1986 Donoho, High-dimensional centrally symmetric polytopes... 2/2005 Rudelson & Vershynin, On sparse reconstruction... 2/2006 Donoho & Tanner, Counting faces of randomly projected polytopes... 5/2006 Xu & Hassibi, Compressed sensing over the Grassmann manifold... 9/2008 Stojnic, Various thresholds for `1 optimization... 7/2009 Bayati & Montanari, The LASSO risk for gaussian matrices 8/2010 Oymak & Hassibi, New null space results and recovery thresholds... 11/2010 Chandrasekaran, Recht, et al., The convex geometry of linear inverse problems 12/2010 McCoy & Tropp, Sharp recovery bounds for convex demixing... 5/2012 Bayati, Lelarge, & Montanari, Universality in polytope phase transitions... 7/2012 Chandrasekaran & Jordan, Computational & statistical tradeoffs... 10/2012 Amelunxen, Lotz, McCoy, & Tropp, Living on the edge... 3/2013 Stojnic, various works 3/2013 Foygel & Mackey, Corrupted sensing: Novel guarantees... 5/2013 Oymak & Hassibi, Asymptotically exact denoising... 5/2013 McCoy & Tropp, From Steiner formulas for cones... 8/2013 McCoy & Tropp, The achievable performance of convex demixing... 9/2013 Oymak, Thrampoulidis, & Hassibi, The squared-error of generalized LASSO... 11/2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 4The Core QuestionHow big is a cone?Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 5.RegularizedDenoisingLiving on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 6Denoising a Piecewise Smooth Signal0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1432101234Piecewise smooth function + additive white noiseTimeValueOriginal + noiseOriginal0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 143210123Denoised piecewise smooth functionTimeValueBy wavelet shrinkageOriginal Observation: z = x\ + g where g normal(0, I) Denoise via wavelet shrinkage = convex optimization:minimize12x z22 + Wx1Reference: Donoho & Johnstone, early 1990sLiving on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 7Setup for Regularized Denoising Let x\ Rd be structured but unknown Let f : Rd R be a convex function that reflects structure Observe z = x\ + g where g normal(0, I) Remove noise by solving the convex program*minimize12x z22 subject to f(x) f(x\) Hope: The minimizer x approximates x\*We assume the side information f(x\) is available. This is equivalent** to knowing theoptimal choice of Lagrange multiplier for the constraint.Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 8Geometry of Denoising I{x : f(x) f(x\)}zgxx\References: Chandrasekaran & Jordan 2012, Oymak & Hassibi 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 9Descent ConesDefinition. The descent cone of a function f at a point x isD(f,x) := {h : f(x+ h) f(x) for some > 0}{h : f(x+ h) f(x)}D(f,x)0{y : f(y) f(x)}x+ D(f,x)xReferences: Rockafellar 1970, Hiriary-Urruty & Lemarechal 1996Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 10Geometry of Denoising II{x : f(x) f(x\)}x\ +KzgxK(g)x\K = D(f,x\)References: Chandrasekaran & Jordan 2012, Oymak & Hassibi 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 11The Risk of Regularized DenoisingTheorem 1. [Oymak & Hassibi 2013] Assume We observe z = x\ + g where g is standard normal The vector x solvesminimize12z x22 subject to f(x) f(x\)Thensup>0E x x\222= E K(g)22where K = D(f,x\) and K is the Euclidean metric projector onto K.Related: BhaskarTangRecht 2012, DonohoJohnstoneMontanari 2012, Chandrasekaran & Jordan 2012Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 12.Statistical.DimensionLiving on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 13Statistical Dimension: The Motion PictureKgK(g)0small coneKgK(g)0big coneLiving on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 14The Statistical Dimension of a ConeDefinition. The statistical dimension (K) of a closed,convex cone K is the quantity(K) := E[K(g)22].where K is the Euclidean metric projector onto K g normal(0, I) is a standard normal vectorReferences: Rudelson & Vershynin 2006, Stojnic 2009, Chandrasekaran et al. 2010, Chandrasekaran & Jordan 2012, Amelunxenet al. 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 15Basic Statistical Dimension CalculationsCone Notation Statistical Dimensionj-dim subspace Lj jNonnegative orthant Rd+ 12dSecond-order cone Ld+1 12(d+ 1)Real psd cone Sd+ 14d(d 1)Complex psd cone Hd+ 12d2References: Chandrasekaran et al. 2010, Amelunxen et al. 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 16Circular Cones001/41/23/411References: Amelunxen et al. 2013, Mu et al. 2013, McCoy & Tropp 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 17Descent Cone of `1 Norm at Sparse Vector0 1/4 1/2 3/4 101/41/23/411References: Stojnic 2009, Donoho & Tanner 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Mackey & Foygel 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 18Descent Cone of S1 Norm at Low-Rank Matrix0 1/4 1/2 3/4 11/41/23/410References: Oymak & Hassibi 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Foygel & Mackey 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 19.Regularized LinearInverse ProblemsLiving on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 20Example: The Random DemodulatorTime ( s)Frequency (MHz)40.08 80.16 120.23 160.31 200.390.010.020.040.050.060.07Time ( s)Frequency (MHz)40.08 80.16 120.23 160.31 200.390.010.020.040.050.060.07PseudorandomNumberGeneratorSeedInput to sensor Reconstruct(e.g., convex optimization)Linear data acquisition system Output from sensorReference: Tropp et al. 2010, ...Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 21Setup for Linear Inverse Problems Let x\ Rd be a structured, unknown vector Let f : Rd R be a convex function that reflects structure Let A Rmd be a measurement operator Observe z = Ax\ Find estimate x by solving convex programminimize f(x) subject to Ax = z Hope: x = x\Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 22Geometry of Linear Inverse Problemsx\ + null(A){x : f(x) f(x\)}x\x\ + D(f,x\)Success!x\ + null(A){x : f(x) f(x\)}x\x\ + D(f,x\)Failure!References: CandesRombergTao 2005, RudelsonVershynin 2006, Chandrasekaran et al. 2010, Amelunxen et al. 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 23Linear Inverse Problems with Random DataTheorem 2. [CRPW10; ALMT13] Assume The vector x\ Rd is unknown The observation z = Ax\ where A Rmd is standard normal The vector x solvesminimize f(x) subject to Ax = zThenm & (D(f,x\))= x = x\ whp [CRPW10; ALMT13]m . (D(f,x\))= x 6= x\ whp [ALMT13]References: Stojnic 2009, Chandrasekaran et al. 2010, Amelunxen et al. 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 24Sparse Recovery via `1 Minimization0 25 50 75 1000255075100Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 25Low-Rank Recovery via S1 Minimization0 10 20 300300600900Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 26.DemixingStructured SignalsLiving on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 27Example: Demixing Spikes + Sines0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.1810.500.51Time (seconds)axis([A(1:2),1,1]);Uy0: Signal with sparse DCT0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.1810.500.51x0: Sparse noiseAmplitudeAmplitude0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.1810.500.51 Noisy SignalAmplitudeTime (seconds)Observation: z = x\ +Uy\ where U is the DCTReferences: StarckDonohoCandes 2002, StarckEladDonoho 2004Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 28Convex Demixing Yields...0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.1810.500.51Time (seconds) DemixedOriginalAmplitudeminimize x1 + y1subject to z = x+UyReferences: StarckDonohoCandes 2002, StarckEladDonoho 2004Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 29Setup for Demixing Problems Let x\ Rd and y\ Rd be structured, unknown vectors Let f, g : Rd R be convex functions that reflect structure Let U Rdd be a known orthogonal matrix Observe z = x\ +Uy\ Demix via convex programminimize f(x) subject to g(y) g(y\)x+Uy = z Hope: (x, y) = (x\,y\)Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 30Geometry of Demixing Problemsx\x\ + D(f,x\)x\ UD(g,y\)Success!x\x\ + D(f,x\)x\ UD(g,y\)Failure!References: McCoy & Tropp 2012, Amelunxen et al. 2013, McCoy & Tropp 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 31Demixing Problems with Random IncoherenceTheorem 3. [MT12, ALMT13] Assume The vectors x\ Rd and y\ Rd are unknown The observation z = x\ +Qy\ where Q is random orthogonal The pair (x, y) solvesminimize f(x) subject to g(y) g(y\)x+Qy = zThen(D(f,x\))+ (D(g,y\)). d = (x, y) = (x\,y\) whp(D(f,x\))+ (D(g,y\))& d = (x, y) 6= (x\,y\) whpLiving on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 32Sparse + Sparse via `1 + `1 Minimization0 25 50 75 1000255075100Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 33Low-Rank + Sparse via S1 + `1 Minimization0 7 14 21 28 3502454907359801225Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 34Statistical Dimension & Phase Transitions Key Question: When do two randomly oriented cones strike? Intuition: When do two randomly oriented subspaces strike?The Approximate Kinematic FormulaLet C and K be closed convex cones in Rd(C) + (K) . d = P {C QK = {0}} 1(C) + (K) & d = P {C QK = {0}} 0where Q is a random orthogonal matrixReferences: Amelunxen et al. 2013, McCoy & Tropp 2013Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 35To learn more...E-mail: jtropp@cms.caltech.eduWeb: http://users.cms.caltech.edu/~jtroppMain Papers Discussed: Chandrasekaran, Recht, et al., The convex geometry of linear inverse problems. FOCM 2012 MT, Sharp recovery bounds for convex deconvolution, with applications. FOCM 2014 ALMT, Living on the edge: A geometric theory of phase transitions in convex optimization. I&I 2014 Oymak & Hassibi, Asymptotically exact denoising in relation to compressed sensing, arXiv cs.IT1305.2714 MT, From Steiner formulas for cones to concentration of intrinsic volumes. DCG 2014 MT, The achievable performance of convex demixing, arXiv cs.IT 1309.7478 More to come!Living on the Edge, Modern TimeFrequency Analysis, Strobl, 3 June 2014 36