# multifaceted algorithm design richard peng m.i.t

Post on 17-Jan-2016

219 views

Embed Size (px)

TRANSCRIPT

Slide 1

Multifaceted Algorithm DesignRichard PengM.I.T.11Large Scale Problems

Emphasis on efficient algorithms in:Scientific computingGraph theory(randomized) numerical routines

Network Analysis

Physical Simulation

Optimization2Well Studied QuestionsScientific computing: fast solvers for structured linear systemsGraphs / combinatorics: network flow problemsRandomized algorithms: subsampling matrices and optimization formulations

BB3

My Representative ResultsLx=bBBCurrent fastest sequential and parallel solvers for linear systems in graph Laplacians matricesFirst nearly-linear time algorithm for approximate undirected maxflowFirst near-optimal routine for row sampling matrices in a 1-norm preserving manner4

Recurring IdeasCan solve a problem by iteratively solving several similar instancesApproximations lead to better approximationsLarger problems can be approximated by smaller ones

ApproximatorData5My Approach to Algorithm DesignNumerical analysis /OptimizationStatistics /Randomized algorithmsProblems at their intersectionIdentify problems that arise at the intersection of multiple areas and study them from multiple anglesCombinatorics / Discrete algorithmsThis talk: structure-preserving sampling6SamplingClassical use in statistics:Extract info from a large data setDirectly output result (estimator)Sampling from matrices, networks, and optimization problems:Often compute on the sampleNeed to preserve more structure

BB7Preserving Graph StructuresUndirected graph, n vertices, m < n2 edgesIs n2 edges (dense) sometimes necessary?For some information, e.g. connectivity:encoded by spanning forest, < n edgesDeterministic, O(m) time algorithm: questions8More intricate Structuresk-connectivity: # of disjoint paths between s-t[Benczur-Karger `96]: for ANY G, can sample to get H with O(nlogn) edges s.t. G H on all cutsStronger: weights of all 2n cuts in graphsCut: # of edges leaving a subset of verticesstMengers theorem / maxflow-mincut: previous works: multiplicative approximation9

How to sample?Widely used: uniform samplingWorks well when data is uniform e.g. complete graphProblem: long path, removing any edge changes connectivity

(can also have both in one graph)More systematic view of sampling?Finding hay in a haystack (coherent) vs. finding needle in a haystack (incoherent)10Algebraic Representation of Graphsn rows / columnsO(m) non-zeros11n verticesm edges

graph Laplacian Matrix LDiagonal: degreeOff-diagonal:-edge weightsEdge-vertex incidence matrix:Beu =-1/1 if u is endpoint of e0 otherwisem rowsn columnsL is the Gram matrix of B, L = BTB 2 -1 -1 -1 1 0 -1 0 1 1 -1 0 -1 0 111xv=0Spectral SimilarityNumerical analysis:LG LH if xTLGx xTLHx for all vectors x x = {0, 1}V:G H on all cuts xu=1xz=1(1-0)2=1(1-1)2=0Gram matrix: LG = BGTBG xTLGx =BGx22 Beu =-1/1 if u is endpoint of e0 otherwiseBGx2 BHx2 x yi22 =i yi2For edge e = uv, (Be:x) 2 = (xu xv)2BGx22 = size of cut given by x12nnAlgebraic View of Sampling EdgesBBL2 Row sampling:Given B with m>>n, sample a few rows to form B s.t.Bx2 Bx2 xNote: normally use A instead of B, n and d instead of m and nm0 -1 0 0 0 1 00 -5 0 0 0 5 0n13Importance SamplingIssue: only one non-zero rowKeep a row, bi, with probability pi, rescale if kept to maintain expectationUniform sampling: pi = 1/k for a factor k size reductionnorm sampling:pi =m/kbi22 / BF2Issue: column with one entry

14

The `right probabilitiesOnly one non-zero rowColumn with one entry

00100n/mn/mn/mn/m1Path + clique:11/nbi: row i of B,L = BTB

: L2 statistical leverage scoresi = biT(BTB)-1bi = bi2L-1 15L2 Matrix-Chernoff Bounds[Foster `49] i i = rank n O(nlogn) rows

[Rudelson, Vershynin `07], [Tropp `12]: sampling with pi iO( logn) gives B s.t. Bx2 Bx2 x w.h.p.: L2 statistical leverage scoresi = biT(BTB)-1bi = bi2L-1 Near optimal:L2-row samples of BGraph sparsifiersIn practice O(logn) 5 usually sufficescan also improve via derandomization16My Approach to Algorithm DesignExtend insights gained from studying problems at the intersection of multiple areas back to these areasCombinatorics / Discrete algorithmsNumerical analysis /OptimizationStatistics /Randomized algorithmsProblems at their intersectionAlgorithmic extensions of structure-preserving samplingMaximum flow

Solving linear systemsPreserving L1-structures17SummaryAlgorithm design approach: study problems at the intersection of areas, and extend insights back.Can sparsify objects via importance sampling.

18Graph LaplacianDiagonal: degreeOff-diagonal: -weightCombinatorics / Discrete algorithmsNumerical analysis /OptimizationSolvers for linear systems involving graph LaplaciansLx = bCurrent fastest sequential and parallel solvers for linear systems in graph Laplacians

Lx=bApplication: estimate all i =bi2L-1 by solving O(logn) linear systemsDirectly related to:Elliptic problemsSDD, M, and H-matricesStatistics /Randomized algorithms19Algorithms for Lx = bGiven any graph Laplacian L with n vertices and m edges, any vector b, find vector x s.t. Lx = b[Vaidya `89]: use graph theory!2014: 1/2loglog plot of c:2011: 12010: 2[Spielman-Teng `04]: O(mlogcn)[P-Spielman `14]: alternate, fully parallelizable approach: my results2006: 322004: 702009: 152010: 6: previous works: questions20Iterative methodsDivision using multiplicationI + A + A2 + A3 + . = (I A)-1 = L-1

Spectral theorem: can view as scalarsSimplification: assume L = I A,A: transition matrix of random walkRichardson iteration: truncate to i terms,Approximate x = (I A)-1b with x(i) = (I + A + Ai)b 2121Richardson Iteration#terms needed lower bounded by information propagationAdiameterbHighly connected graphs: few terms ok

bAbA2bNeed n matrix operations?Evaluation (Horners rule): (I + A + A2)b = A(Ab + b) + bi terms: x(0) = b, x(i + 1) = Ax(i) + bi matrix-vector multiplicationsCan interpret as gradient descent2222(I A)-1 = I + A + A2 + A3 + .= (I + A) (I + A2) (I + A4)Degree n n Operations?Combinatorial view:A: step of random walkI A2: Laplacian of the 2 step random walk Dense matrix!

Repeated squaring: A16 = ((((A2)2)2)2, 4 operationsO(logn) terms okSimilar to multi-level methodsStill a graph Laplacian!Can sparsify!2323Repeated Sparse Squaring

Combining known tools: efficiently sparsify I A2 without computing A2(I A)-1 = (I + A) (I + A2) (I + A4)[P-Spielman `14] approximate L-1 with O(logn) sparse matriceskey ideas: modify factorization to allow gradual introduction and control of error2424SummaryAlgorithm design approach: study problems at the intersection of areas, and extend insights back.Can sparsify objects via importance sampling.Solve Lx=b efficiently via sparsified squaring.

25few iterations of Lx = b

[Tutte `61]: graph drawing, embeddings[ZGL `03], [ZHS `05]: inference on graphical modelsInverse powering: eigenvectors / heat kernel:[AM `85] spectral clustering[OSV `12]: balanced cuts[SM `01][KMST `09]: image segmentation

[CFMNPW`14]: Helmholtz decomp. on 3D mesh

26Many iterations of Lx = b[Karmarkar, Ye, Renegar, Nesterov, Nemirovski ]: convex optimization via. solving O(m1/2) linear systems

[DS `08]: optimization on graphs Laplacian systems[KM `09][MST`14]: random spanning trees[CKMST `11]: faster approx maximum flow[KMP `12]: multicommodity flow

27MaxFlowCombinatorics / Discrete algorithmsNumerical analysis /OptimizationStatistics /Randomized algorithmsMaximum flowFirst O(mpolylog(n)) time algorithm for approximate undirected maxflow28(for unweighted, undirected graphs)Maximum flow ProblemststGiven s, t, find the maximum number of disjoint s-t pathsDual: separate s and t by removing fewest edgesApplications:ClusteringImage processingScheduling29What makes Maxflow HardHighly connected: route up to n paths

Long paths: a step may involve n verticesGoal: handle both and do better than many steps long paths = n2Each easy on their own30Algorithms for FlowsCurrent fastest maxflow algorithms:Exact (weakly-polytime): invoke Lx=bApproximate: modify algorithms for Lx=b

[P`14]: (1 )-approx maxflow in O(mlogcn-2) timeIdeas introduced:1980: dynamic trees

1970s: Blocking flows1986: dual algorithms

1989: connections to Lx = b

2013: modify Lx = b2010: few calls to Lx = b

31Algebraic formulation of min s-t cut:Minimize Bx2 subject to xs = 0, xt = 1 and x integral

Maximum Flow in Almost Linear Time[Madry `10]: finding O(m1+) sized approximator that require O(m) calls in O(m1+) time (for any > 0)ApproximatorMaxflow[Racke-Shah-Taubig `14] O(n) sized approximator that require O(logcn) iterations via solving maxflows on graphs of total size O(mlogcn)MaxflowMaxflowApproximatorApproximatorChicken and egg problemO(m1+2-2) timeO(mlogcn-2) time?Algebraic formulation of min s-t cut:Minimize Bx1 subject to xs = 0, xt = 1 *1 : 1-norm, sum of absolute values

[Sherman `13] [Kelner-Lee-Orecchia-Sidford `13]:can find approximate maxflow iteratively via several calls to a structure approximator32Algorithmic SolutionUltra-sparsifier (e.g. [Koutis-Miller-P `10]): for any k, can find H close to G, but equivalent to graph of size O(m/k)``MaxflowAbsorb additional (small) error via more calls to approximatorRecurse on instances with smaller total size, total cost: O(mlogcn)Key step: vertex reductions via edge reductions[P`14]: build approximator on the smaller graph[CLMPPS`15]: extends to numerical data, has close connections to variants of Nystroms method33SummaryAlgorithm design approach: study problems at the intersection of areas, and extend insights back.Can sparsify objects via importance sampling.Solve Lx=b efficiently via sparsified squaring.Approximate maximum flow routines and structure approximators can be constructed recursively from each other via graph spa