multifaceted algorithm design richard peng m.i.t

45
Multifaceted Algorithm Design Richard Peng M.I.T.

Upload: darrell-wilkins

Post on 17-Jan-2016

255 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Multifaceted Algorithm Design Richard Peng M.I.T

Multifaceted Algorithm Design

Richard PengM.I.T.

Page 2: Multifaceted Algorithm Design Richard Peng M.I.T

LARGE SCALE PROBLEMS

Emphasis on efficient algorithms in:• Scientific computing• Graph theory• (randomized) numerical

routines

Network Analysis

Physical Simulation

Optimization

Page 3: Multifaceted Algorithm Design Richard Peng M.I.T

WELL STUDIED QUESTIONS

Scientific computing: fast solvers for structured linear systems

Graphs / combinatorics: network flow problems

Randomized algorithms: subsampling matrices and optimization formulations

B B’

Page 4: Multifaceted Algorithm Design Richard Peng M.I.T

MY REPRESENTATIVE RESULTS

Lx=b

B B’

Current fastest sequential and parallel solvers for linear systems in graph Laplacians matrices

First nearly-linear time algorithm for approximate undirected maxflow

First near-optimal routine for row sampling matrices in a 1-norm preserving manner

Page 5: Multifaceted Algorithm Design Richard Peng M.I.T

RECURRING IDEAS

Can solve a problem by iteratively solving several similar instances

Approximations lead to better approximations

Larger problems can be approximated by smaller ones

Approximator

Data

Page 6: Multifaceted Algorithm Design Richard Peng M.I.T

MY APPROACH TO ALGORITHM DESIGN

Numerical analysis /Optimization

Statistics /Randomized algorithms

Problems at their intersection

Identify problems that arise at the intersection of multiple areas and study them from multiple angles

Combinatorics / Discrete algorithms

This talk: structure-preserving sampling

Page 7: Multifaceted Algorithm Design Richard Peng M.I.T

SAMPLING

Classical use in statistics:• Extract info from a large data

set• Directly output result

(estimator)

Sampling from matrices, networks, and optimization problems:• Often compute on the sample• Need to preserve more structure

B B’

Page 8: Multifaceted Algorithm Design Richard Peng M.I.T

PRESERVING GRAPH STRUCTURESUndirected graph, n vertices, m < n2 edges

Is n2 edges (dense) sometimes necessary?

For some information, e.g. connectivity:encoded by spanning forest, < n edges

Deterministic, O(m) time algorithm

: questions

Page 9: Multifaceted Algorithm Design Richard Peng M.I.T

MORE INTRICATE STRUCTURES

k-connectivity: # of disjoint paths between s-t

[Benczur-Karger `96]: for ANY G, can sample to get H with O(nlogn) edges s.t. G ≈ H on all cuts

Stronger: weights of all 2n cuts in graphs

Cut: # of edges leaving a subset of vertices

s

t

Menger’s theorem / maxflow-mincut

: previous works

≈: multiplicative approximation

Page 10: Multifaceted Algorithm Design Richard Peng M.I.T

HOW TO SAMPLE?Widely used: uniform sampling Works well when data is

uniform e.g. complete graph

Problem: long path, removing any edge changes connectivity

(can also have both in one graph)

More systematic view of sampling?

Page 11: Multifaceted Algorithm Design Richard Peng M.I.T

ALGEBRAIC REPRESENTATION OF GRAPHS

n rows / columnsO(m) non-zeros

1

1

n verticesm edges

graph Laplacian Matrix L• Diagonal: degree• Off-diagonal:

-edge weights

Edge-vertex incidence matrix:Beu = -1/1 if u is

endpoint of e

0 otherwise

m rowsn columns

L is the Gram matrix of B, L = BTB

2 -1 -1 -1 1 0 -1 0 1

1 -1 0 -1 0 1

Page 12: Multifaceted Algorithm Design Richard Peng M.I.T

xv=0

SPECTRAL SIMILARITY

Numerical analysis:LG ≈ LH if xTLGx ≈ xTLHx for all vectors x

x = {0, 1}V:

G ≈ H on all cuts

xu=1 xz=1

(1-0)2=1

(1-1)2=0

Gram matrix: LG = BGTBG xTLGx

=║BGx║22

Beu = -1/1 if u is endpoint of e

0 otherwise

║BGx║2 ≈║BHx║2 ∀ x

║yi║22

=Σi yi2

For edge e = uv, (Be:x) 2

= (xu – xv)2

║BGx║22 = size of cut given by

x

Page 13: Multifaceted Algorithm Design Richard Peng M.I.T

n

n

ALGEBRAIC VIEW OF SAMPLING EDGES

B’

B

L2 Row sampling:

Given B with m>>n, sample a few rows to form B’ s.t.║Bx║2 ≈║B’x║2 ∀ x

Note: normally use A instead of B, n and d instead of m and n

m

0 -1 0 0 0 1 0 0 -5 0 0 0 5

0≈n

Page 14: Multifaceted Algorithm Design Richard Peng M.I.T

IMPORTANCE SAMPLING

Issue: only one non-zero row

Keep a row, bi, with probability pi, rescale if kept to maintain expectation

Uniform sampling: pi = 1/k for a factor k size reduction

norm sampling:pi =m/k║bi║2

2 / ║B║F2

Issue: column with one entry

Page 15: Multifaceted Algorithm Design Richard Peng M.I.T

THE `RIGHT’ PROBABILITIES

Only one non-zero row Column with one entry

00100

n/mn/mn/mn/m1

Path + clique:

1

1/n

bi: row i of B,L = BTB

τ: L2 statistical leverage scores

τi = biT(BTB)-1bi = ║bi║2

L-

1

Page 16: Multifaceted Algorithm Design Richard Peng M.I.T

L2 MATRIX-CHERNOFF BOUNDS

[Foster `49] Σi τi = rank ≤ n O(nlogn) rows

[Rudelson, Vershynin `07], [Tropp `12]: sampling with pi ≥ τiO( logn) gives B’ s.t. ║Bx║2 ≈║B’x║2 ∀x w.h.p.

τ: L2 statistical leverage scores

τi = biT(BTB)-1bi = ║bi║2

L-

1

Near optimal:• L2-row samples of

B• Graph sparsifiers

• In practice O(logn) 5 usually suffices

• can also improve via derandomization

Page 17: Multifaceted Algorithm Design Richard Peng M.I.T

MY APPROACH TO ALGORITHM DESIGN

Extend insights gained from studying problems at the intersection of multiple areas back to these areas

Combinatorics / Discrete algorithms

Numerical analysis /Optimization

Statistics /Randomized algorithms

Problems at their intersection

Algorithmic extensions of structure-preserving sampling

Maximum flow

Solving linear systems

Preserving L1-structures

Page 18: Multifaceted Algorithm Design Richard Peng M.I.T

SUMMARY

• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance

sampling.

Page 19: Multifaceted Algorithm Design Richard Peng M.I.T

Graph Laplacian• Diagonal: degree• Off-diagonal: -

weightCombinatorics / Discrete algorithms

Numerical analysis /Optimization

Solvers for linear systems involving graph Laplacians

Lx = b

Current fastest sequential and parallel solvers for linear systems in graph Laplacians

Lx=b

Application: estimate all τi =║bi║2

L-1 by solving O(logn) linear systems

Directly related to:• Elliptic problems• SDD, M, and H-

matrices

Statistics /Randomized algorithms

Page 20: Multifaceted Algorithm Design Richard Peng M.I.T

ALGORITHMS FOR Lx = b

Given any graph Laplacian L with n vertices and m edges, any vector b, find vector x s.t. Lx = b

[Vaidya `89]: use graph theory!

2014: 1/2

loglog plot of c:

2011: 1

2010: 2

[Spielman-Teng `04]: O(mlogcn)

[P-Spielman `14]: alternate, fully parallelizable approach: my

results

2006: 32

2004: 70

2009: 15

2010: 6

: previous works

: questions

Page 21: Multifaceted Algorithm Design Richard Peng M.I.T

ITERATIVE METHODS

Division using multiplicationI + A + A2 + A3 + …. = (I – A)-1

= L-1

Spectral theorem: can view as scalars

Simplification: assume L = I – A,A: transition matrix of random walk

Richardson iteration: truncate to i terms,Approximate x = (I – A)-1b with x(i) = (I + A + … Ai)b

Page 22: Multifaceted Algorithm Design Richard Peng M.I.T

RICHARDSON ITERATION

#terms needed lower bounded by information propagation Adiameterb

Highly connected graphs: few terms ok

b Ab A2b

Need n matrix operations?

Evaluation (Horner’s rule):• (I + A + A2)b = A(Ab + b) +

b• i terms: x(0) = b, x(i + 1) = Ax(i)

+ b

i matrix-vector multiplications

Can interpret as gradient descent

Page 23: Multifaceted Algorithm Design Richard Peng M.I.T

(I – A)-1 = I + A + A2 + A3 + …. = (I + A) (I + A2) (I +

A4)…

DEGREE N N OPERATIONS?

Combinatorial view:• A: step of random walk• I – A2: Laplacian of the 2 step random walk

Dense matrix!

Repeated squaring: A16 = ((((A2)2)2)2, 4 operations

• O(logn) terms ok• Similar to multi-level

methods

Still a graph Laplacian!

Can sparsify!

Page 24: Multifaceted Algorithm Design Richard Peng M.I.T

REPEATED SPARSE SQUARING

Combining known tools: efficiently sparsify I – A2 without computing A2

(I – A)-1 = (I + A) (I + A2) (I + A4)…

[P-Spielman `14] approximate L-1 with O(logn) sparse matrices

key ideas: modify factorization to allow gradual introduction and control of error

Page 25: Multifaceted Algorithm Design Richard Peng M.I.T

SUMMARY

• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified

squaring.

Page 26: Multifaceted Algorithm Design Richard Peng M.I.T

FEW ITERATIONS OF Lx = b• [Tutte `61]: graph drawing, embeddings• [ZGL `03], [ZHS `05]: inference on graphical

models

Inverse powering: eigenvectors / heat kernel:• [AM `85] spectral clustering• [OSV `12]: balanced cuts• [SM `01][KMST `09]: image segmentation

[CFMNPW`14]: Helmholtz decomp. on 3D mesh

Page 27: Multifaceted Algorithm Design Richard Peng M.I.T

MANY ITERATIONS OF Lx = b[Karmarkar, Ye, Renegar, Nesterov, Nemirovski …]: convex optimization via. solving O(m1/2) linear systems

[DS `08]: optimization on graphs Laplacian systems

[KM `09][MST`14]: random spanning trees

[CKMST `11]: faster approx maximum flow

[KMP `12]: multicommodity flow

Page 28: Multifaceted Algorithm Design Richard Peng M.I.T

MAXFLOW

Combinatorics / Discrete algorithms

Numerical analysis /Optimization

Statistics /Randomized algorithms

Maximum flow

First O(mpolylog(n)) time algorithm for approximate undirected maxflow

Page 29: Multifaceted Algorithm Design Richard Peng M.I.T

(for unweighted, undirected graphs)

MAXIMUM FLOW PROBLEM

s

t

s

t

Given s, t, find the maximum number of disjoint s-t paths

Dual: separate s and t by removing fewest edges

Applications:• Clustering• Image processing• Scheduling

Page 30: Multifaceted Algorithm Design Richard Peng M.I.T

WHAT MAKES MAXFLOW HARD

Highly connected: route up to n paths

Long paths: a step may involve n vertices

Goal: handle both and do better than many steps × long paths = n2

Each ‘easy’ on their own

Page 31: Multifaceted Algorithm Design Richard Peng M.I.T

ALGORITHMS FOR FLOWS

Current fastest maxflow algorithms:• Exact (weakly-polytime): invoke Lx=b• Approximate: modify algorithms for

Lx=b[P`14]: (1 – ε)-approx maxflow in O(mlogcnε-2) time

Ideas introduced:

1980: dynamic trees

1970s: Blocking flows

1986: dual algorithms

1989: connections to Lx = b

2013: modify Lx = b

2010: few calls to Lx = b

Page 32: Multifaceted Algorithm Design Richard Peng M.I.T

Algebraic formulation of min s-t cut:Minimize ║Bx║2 subject to xs = 0, xt = 1 and x integral

MAXIMUM FLOW IN ALMOST LINEAR TIME

[Madry `10]: finding O(m1+θ) sized approximator that require O(mθ) calls in O(m1+θ) time (for any θ > 0)Approximator

Maxflow [Racke-Shah-Taubig `14] O(n) sized approximator that require O(logcn) iterations via solving maxflows on graphs of total size O(mlogcn)

Maxflow Maxflow

Approximator Approximator

Chicken and egg problem

O(m1+2θε-2) timeO(mlogcnε-2) time?

Algebraic formulation of min s-t cut:Minimize ║Bx║1 subject to xs = 0, xt = 1 ║*║1 : 1-norm, sum of absolute

values

[Sherman `13] [Kelner-Lee-Orecchia-Sidford `13]:can find approximate maxflow iteratively via several calls to a structure approximator

Page 33: Multifaceted Algorithm Design Richard Peng M.I.T

ALGORITHMIC SOLUTION

Ultra-sparsifier (e.g. [Koutis-Miller-P `10]): for any k, can find H close to G, but equivalent to graph of size O(m/k)

` `

Maxflow

Absorb additional (small) error via more calls to approximatorRecurse on instances with smaller total size, total cost: O(mlogcn)

Key step: vertex reductions via edge reductions[P`14]: build approximator on the smaller graph

[CLMPPS`15]: extends to numerical data, has close connections to variants of Nystrom’s method

Page 34: Multifaceted Algorithm Design Richard Peng M.I.T

SUMMARY

• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified squaring.• Approximate maximum flow routines and

structure approximators can be constructed recursively from each other via graph sparsifiers.

Page 35: Multifaceted Algorithm Design Richard Peng M.I.T

RANDOMIZED NUMERICALLINEAR ALGEBRA

Combinatorics / Discrete algorithms

Numerical analysis /Optimization

Statistics /Randomized algorithms

L1-preserving row sampling

B B’

First near-optimal routine for row sampling matrices in a 1-norm preserving manner

Page 36: Multifaceted Algorithm Design Richard Peng M.I.T

║y║1║y║2

GENERALIZATIONGeneralization of row sampling:given A, q, find A’ s.t.║Ax║q ≈║A’x║q ∀ x

1-norm: standard for representing cuts, used in sparse recovery / robust regression

Applications (for general A):• Feature selection• Low rank approximation / PCA

q-norm: ║y║q = (Σ|yi|q)1/q

Page 37: Multifaceted Algorithm Design Richard Peng M.I.T

Omitting corresponding empirical studies

ROW SAMPLING ROUTINES

#rows for q=2

#rows for q=1

Runtime

Dasgupta et al. `09 n2.5 mn5

Magdon-Ismail `10 nlog2n mn2

Sohler-Woodruff `11 n3.5 mnω-1+θ

Drineas et al. `12 nlogn mnlogn

Clarkson et al. `12 n4.5log1.5n mnlogn

Clarkson-Woodruff `12 n2logn n8 nnz

Mahoney-Meng `12 n2 n3.5 nnz+n6

Nelson-Nguyen `12 n1+θ nnz

Li et.`13, Cohen et al. 14

nlogn n3.66 nnz+nω+θ

[Naor `11][Matousek `97]: on graphs, L2 approx Lq approx ∀ 1 ≤ q ≤ 2

How special are graphs?

A’ s.t.║Ax║q ≈║A’x║q ∀ x nnz: # of non-zeros in A

How special is L2?

Page 38: Multifaceted Algorithm Design Richard Peng M.I.T

L1 ROW SAMPLING

L1 Lewis weights ([Lewis `78]):

w s.t. wi2 = ai

T(ATW-

1A)-1ai

Recursive definition!

[Sampling with pi ≥ wiO( logn) gives ║Ax║1 ≈ ║A’x║1

∀x

Can check: Σi wi ≤ n O(nlogn) rows

[Talagrand `90, “Embedding subspaces of L1 into LN

1”] can be analyzed as row-sampling /

sparsification

Page 39: Multifaceted Algorithm Design Richard Peng M.I.T

[COHEN-P `14]

Update w on LHS with w on RHS

w’i (ai

T(ATW-1A)-1ai)1/2

q Previous # of rows New # Rows Runtime

1 n2.5 nlogn nnz+nω+θ

1 < q < 2 nq/2+2 nlogn(loglogn)2 nnz+nω+θ

2 < q nq+1 np/2logn nnz+nq/2+O(1)

Converges in loglogn steps: analyze ATW-1A spectrally

Aside: similar to iterative reweighted least squares

Elementary, optimization motivated proof of w.h.p. concentration for L1

Page 40: Multifaceted Algorithm Design Richard Peng M.I.T

SUMMARY

• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified squaring.• Approximate maximum flow routines and cut-

approximators can be constructed recursively from each other via graph sparsifiers.• Wider ranges of structures can be

sparsified, key statistical quantities can be computed iteratively.

Page 41: Multifaceted Algorithm Design Richard Peng M.I.T

I’VE ALSO WORKED ON

• Dynamic graph data structures• Graph partitioning• Parallel algorithms• Image processing• Anomaly / sybil

detection in graphs

Page 42: Multifaceted Algorithm Design Richard Peng M.I.T

FUTURE WORK:LINEAR SYSTEM SOLVERS

• Wider classes of linear systems• Relation to optimization /

learning

Combinatorics / Discrete algorithms

Numerical analysis /Optimization

Statistics /Randomized algorithms

Mx=bSolvers for linear systems involving graph Laplacians

Page 43: Multifaceted Algorithm Design Richard Peng M.I.T

FUTURE WORK:COMBINATORIAL OPTIMIZATION

Faster algorithms for more classical algorithmic graph theory problems?

Combinatorics / Discrete algorithms

Numerical analysis /Optimization

Statistics /Randomized algorithms

Maximum flow

Page 44: Multifaceted Algorithm Design Richard Peng M.I.T

FUTURE WORK: RANDOMIZED NUMERICAL LINEAR ALGEBRA

• Other algorithmic applications of Lewis weights?• Low-rank approximation in L1?

• O(n)-sized L1-preserving row samples?(these exist for L2)

Combinatorics / Discrete algorithms

Numerical analysis /Optimization

Statistics /Randomized algorithms

L1-preserving row sampling

B B’

Page 45: Multifaceted Algorithm Design Richard Peng M.I.T

SUMMARY

Combinatorics / Discrete algorithms

Numerical analysis / Optimization

Statistics /Randomized algorithms

Problems at their intersection

B B’

Links to arXiv manuscripts and videos of more detailed talks are at:

math.mit.edu/~rpeng/

Mx=b