an efficient parallel solver for sdd linear systems richard peng m.i.t. joint work with dan spielman...

39
An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Upload: avice-glenn

Post on 15-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

An Efficient Parallel Solver for SDD Linear

Systems

Richard PengM.I.T.

Joint work with Dan Spielman (Yale)

Page 2: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Efficient Parallel Solvers for SDD Linear Systems

Richard PengM.I.T.

Work in progress with Dehua Cheng (USC),Yu Cheng (USC), Yintat Lee (MIT), Yan Liu (USC), Dan Spielman (Yale), and Shanghua Teng (USC)

Page 3: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

OUTLINE

• LGx = b

•Why is it hard?• Key Tool• Parallel Solver•Other Forms

Page 4: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

LARGE GRAPHS

Images

Algorithmic challenges: How to store?

How to analyze?

How to optimize?

Meshes

Roads

Social networks

Page 5: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

GRAPH LAPLACIAN

Row/column vertexOff-diagonal -weightDiagonal weighted degree

11

2

Input: graph Laplacian L, vector bOutput: vector x s.t. Lx ≈ b

Lx=b

n verticesm edges

Page 6: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

THE LAPLACIAN PARADIGM

Lx=b

Directly related:Elliptic systems

Few iterations: Eigenvectors,Heat kernels

Many iterations / modify algorithmGraph problemsImage processing

Page 7: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Direct Methods: O(n3)O(n2.3727)Iterative methods: O(nm), O(mκ1/2)Combinatorial Preconditioning• [Vaidya`91]: O(m7/4)• [Boman-Hendrickson`01]: O(mn)• [Spielman-Teng `03, `04]: O(m1.31)O(mlogcn)• [KMP`10][KMP`11][KOSZ 13][LS`13]

[CKMPPRX`14]: O(mlog2n)O(mlog1/2n)

SOLVERS

Lx=b1

1

2

n x n matrixm non-zeros

Page 8: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

Nearly-linear work parallel Laplacian solvers• [KM `07]: O(n1/6+a) for planar• [BGKMPT `11]: O(m1/3+a)

PARALLEL SPEEDUPS

Speedups by splitting work• Time: max # of dependent steps• Work: # operations

Common architectures: multicore, MapReduce

Page 9: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

OUR RESULT

Input: Graph Laplacian LG with condition number κOutput: Access to operator Z s.t. Z ≈ε LG

-1

Cost: O(logc1m logc2κ log(1/ε)) depth O(m logc1m logc2κ log(1/ε)) work

Note: LG is low rank, omitting pseudoinverses

• Logarithmic dependency on error

• κ ≤ O(n2wmax/wmin)Extension: sparse approximation of LG

p for any -1 ≤ p ≤ 1 with poly(1/ε) dependency

Page 10: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SUMMARY

• Would like to solve LGx = b

• Goal: polylog depth, nearly-linear work

Page 11: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

OUTLINE

• LGx = b

•Why is it hard?• Key Tool• Parallel Solver•Other Forms

Page 12: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

EXTREME INSTANCES

Highly connected, need global steps

Long paths / tree, need many steps

Solvers must handle both simultaneously

Each easy on their own:

Iterative method Gaussian elimination

Page 13: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

PREVIOUS FAST ALGORITHMSCombinatoria

l preconditioni

ng

Spectral sparsification

Tree RoutingLow stretch

spanning trees

Local partitioning

Tree Contraction

Iterative Methods

• Reduce G to a sparser G’• Terminate at a spanning tree

T

• Polynomial in LGLT-1

• Need: LG-1LT

=(LGLT-

1)-1Horner’s method:• degree d O(dlogn) depth• [Spielman-Teng` 04]: d ≈

n1/2

• Fast due to sparser graphs

Focus of subsequent improvements

‘Driver’

Page 14: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

If |a| ≤ ρ, κ = (1-ρ)-1 terms give good approximation to (1 – a)-1

POLYNOMIAL APPROXIMATIONS

Division with multiplication:(1 – a)-1 = 1 + a + a2 + a3 + a4 + a5…

• Spectral theorem: this works for marices!• Better: Chebyshev / heavy ball:

d = O(κ1/2) sufficient Optimal ([OSV `12])Exists G (,e.g. cycle)

where κ(LGLT-1) needs to

be Ω(n)

Ω(n1/2) lower bound on depth?

Page 15: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

LOWER BOUND FOR LOWER BOUND

[BGKMPT `11]: O(m1/3+a) via. (pseudo) inverse:• Preprocess: O(log2n) depth, O(nω) work• Solve: O(logn) depth, O(n2) work

• Inverse is dense, expensive to use

• Only use on O(n1/3) sized instancesPossible improvement: can we make LG

-1 sparse?

Multiplying by LG-

1 is highly parallel!

[George `73][LRT `79]:yes for planar graphs

Page 16: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SUMMARY

• Would like to solve LGx = b

• Goal: polylog depth, nearly-linear work• `Standard’ numerical methods have high

depth• Equivalent: sparse inverse

representations

Aside: cut approximation / oblivious routing schemes by [Madry `10][Sherman `13][KLOS `13] are parallel, can be viewed as asynchronous iterative methods

Page 17: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

OUTLINE

• LGx = b

•Why is it hard?•Key Tool• Parallel Solver•Other Forms

Page 18: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

DEGREE D POLYNOMIAL DEPTH D?

Apply to power method:(1 – a)-1 = 1 + a + a2 + a3 + a4 + a5 + a6 + a7 …=(1 + a) (1 + a2) (1 + a4)…

• a16 = (((a2)2)2)2

• Repeated squaring sidesteps assumption in lower bound!

Matrix version: I +

(A)2i

Page 19: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

REDUCTION TO (I – A)-1

• Adjust/rescale so diagonal = I• Add to diag(L) to make it full

rank

A:Weighted degree < 1Random walk,|A| < 1

Page 20: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

INTERPRETATION

A: one step transition of random walk

A2i

: 2i step transition of random walkOne step of walk on each Ai =

A2i

A

I

(I – A)-1 = (I + A)(I + A2)…(I +

A2i

)…

• O(logκ) matrix multiplications• O(nωlogκlogn) work

Need: size reductions

Until A2i

becomes `expander’

Page 21: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SIMILAR TO

Connectivity Parallel Solver

Iteration Ai+1 ≈ Ai2 Ai+1 ≈ Ai

2

Until |Ad| small |Ad| small

Size Reduction Low degree Sparse graph

Method Derandomized Randomized

Solution transfer

Connectivity (I - Ai)xi = bi

• Multiscale methods• NC algorithm for shortest path• Logspace connectivity: [Reingold `02]• Deterministic squaring: [Rozenman Vadhan

`05]

Page 22: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SUMMARY

• Would like to solve LGx = b

• Goal: polylog depth, nearly-linear work• `Standard’ numerical methods have high

depth• Equivalent: sparse inverse representations• Squaring gets around lower bound

Page 23: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

OUTLINE

• LGx = b

•Why is it hard?• Key Tool•Parallel Solver•Other Forms

Page 24: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

• b x: linear operator, Z• Algorithm matrix Z ≈ε (I –

A)-1

WHAT IS AN ALGORITHM

b x

Goal: Z = sum/product of a few matrices

Input OutputZ

• ≈ε:, spectral similarity with relative error ε

• Symmetric, invertible, composable (additive)

Page 25: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SQUARING

• [BSS`09]: exists I - A’ ≈ε I – A2 with O(nε-2) entries• [ST `04][SS`08][OV `11] + some

modifications: O(nlogcn ε-2) entries, efficient, parallel

[Koutis `14]: faster algorithm based on spanners /low diameter decompositions

Page 26: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

APPROXIMATE INVERSE CHAIN

I - A1 ≈ε I – A2

I – A2 ≈ε I – A12

…I – Ai ≈ε I – Ai-1

2

I - Ad ≈ I

I - A0

I - Ad≈ I

• Convergence: |Ai+1|<|Ai|/2

• I – Ai+1 ≈ε I – Ai2: |Ai+1|<|Ai|/ 1.5

d = O(logκ)

Page 27: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

ISSUE 1

Only have 1 – ai+1 ≈ 1 – ai

2Solution: apply one at a time

(1 – ai)-1 = (1 + ai)(1 – ai2)-1

≈ (1 + ai)(1 – ai+1)-1

Induction: zi+1 ≈ (1 – ai+1)-1

I - A0

I - Ad≈ I

zi = (1 + ai) zi+1 ≈ (1 + ai)(1 – ai+1)-1 ≈(1 – ai)-1

Need to invoke: (1 – a)-1

= (1 + a) (1 + a2) (1 + a4)…

zd = (1 – ad)-1 ≈ 1

Page 28: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

ISSUE 2

In matrix setting, replacements by approximations need to be symmetric:

Z ≈ Z’ UTZU ≈ UTZ’U

In Zi, terms around (I - Ai2)-1 ≈

Zi+1 needs to be symmetric

(I – Ai) Zi+1 is not symmetric around Zi+1

Solution 1 ([PS `14]):(1 – a)-1=1/2 ( 1 + (1 + a)(1 – a2)-1(1 + a))

Page 29: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

ALGORITHM

Zi+1 ≈ α+ε (1 – Ai2)-1

(I – Ai)-1 = ½ [I+(1 + Ai) (I – Ai2)-1 (1

+ Ai)]

• Composition: Zi ≈ α+ε (I – Ai)-1

• Total error = dε= O(logκε)

Chain: (I – Ai+1)-1 ≈ε (I – Ai2)-

1

Zi ½ [I+(1 + Ai) Zi+1(I + Ai)]

Induction: Zi+1 ≈α (I – Ai+1) -1

Page 30: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

PSEUDOCODE

x = Solve(I, A0, … Ad, b)

1. For i from 1 to d,set bi = (I + Ai) bi-1.

2. Set xd = bd.

3. For i from d - 1 downto 0,

set xi = ½[bi+(I +Ai)xi+1].

Page 31: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

TOTAL COST

• d = O(logκ)• ε = 1 / d• nnz(Ai): O(nlogcnlog2κ)

O(logcnlogκ) depth, O(nlogcnlog3κ) work

• Multigrid V-cycle like call structure: each level makes one call to next

• Answer from d = O(log(κ))matrix-vector multiplications

Page 32: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SUMMARY

• Would like to solve LGx = b

• Goal: polylog depth, nearly-linear work• `Standard’ numerical methods have high

depth• Equivalent: sparse inverse representations• Squaring gets around lower bound• Can keep squares sparse• Operator view of algorithms can drive its

design

Page 33: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

OUTLINE

• LGx = b

•Why is it hard?• Key Tool• Parallel Solver•Other Forms

Page 34: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

REPRESENTATION OF (I – A)-1

Algorithm from [PS `14] gives: (I – A)-1 ≈ ½[I + (I + A0)[I + (I + A1)(I – A2)-1(I + A1)](I + A0)]

Sum and product of O(logκ) matricesNeed: just a product

Gaussian graphical models sampling:• Sample from Gaussian with covariance I –

A• Need C s.t. CTC ≈ (I – A)-1

Page 35: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SOLUTION 2

(I – A)-1= (I + A)1/2(I – A2)-1(I + A)1/2

≈ (I + A)1/2(I – A1)-1(I + A)1/2

Repeat on A1: (I – A)-1 ≈ CTC

where C = (I + A0)1/2(I + A1)1/2…(I + Ad)1/2

How to evaluate (I + Ai)1/2?

• Well-conditioned matrix• Mclaurin series

expansion= low degree polynomial

• What about (I + A0)1/2?

A1 ≈ A02:

• Eigenvalues between [0,1]

• Eigenvalues of I + Ai in [1,2]

Page 36: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SOLUTION 3 ([CCLPT `14])

(I – A)-1= (I + A/2)1/2(I – A/2 - A2/2)-1(I + A/2)1/2

• Modified chain: I – Ai+1≈ I – Ai/2 - Ai

2/2

• I + Ai/2 has eigenvalues in [1/2, 3/2]

• Replace with O(loglogκ) degree polynomial / Mclaurin series, T1/2C = T1/2(I + A0/2) T1/2(I + A1/2)…T1/2 (I + Ad/2)

gives (I – A)-1 ≈ CTC, Generalization to (I – A)p (-1 < p <1): T-p/2(I + A0) T-p/2(I + A1) …T-p/2 (I + Ad)

Page 37: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

SUMMARY

• Would like to solve LGx = b

• Goal: polylog depth, nearly-linear work• `Standard’ numerical methods have high

depth• Equivalent: sparse inverse representations• Squaring gets around lower bound• Can keep squares sparse• Operator view of algorithms can drive its

design• Entire class of algorithms /

factorizations• Can approximate wider class of

functions

Page 38: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

OPEN QUESTIONSGeneralizations:• (Sparse) squaring as an iterative method?• Connections to multigrid/multiscale

methods?• Other functions? log(I - A)? Rational

functions?• Other structured systems?• Different notions of sparsification?

More efficient:• How fast for O(n) sized sparsifier?• Better sparsifiers? for I – A2?• How to represent resistances?• O(n) time solver? (O(mlogcn) preprocessing)

Applications / implementations• How fast can spectral sparsifiers run?• What does Lp give for -1<p<1?• Trees (from sparsifiers) as a stand-alone tool?

Page 39: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)

THANK YOU!

Questions?

Manuscripts on arXiv:• http://arxiv.org/abs/1311.3286• http://arxiv.org/abs/1410.5392