recent trends in graph partitioning for scientific computing

46
Recent Trends in Graph Partitioning for Scientific Computing Burkhard Monien, Universität Paderborn Henning Meyerhenke, Georgia Institute of Technology SIAM Workshop on Combinatorial Scientific Computing Darmstadt, Germany, May 20 th , 2011

Upload: homer

Post on 11-Jan-2016

27 views

Category:

Documents


2 download

DESCRIPTION

Recent Trends in Graph Partitioning for Scientific Computing. Burkhard Monien , Universit ät Paderborn Henning Meyerhenke, Georgia Institute of Technology SIAM Workshop on Combinatorial Scientific Computing Darmstadt, Germany, May 20 th , 2011. Outline. Introduction Global Methods - PowerPoint PPT Presentation

TRANSCRIPT

Recent Trends in Graph Partitioning for Scientific Computing

Recent Trends inGraph Partitioningfor Scientific ComputingBurkhard Monien, Universitt PaderbornHenning Meyerhenke, Georgia Institute of Technology

SIAM Workshop on Combinatorial Scientific ComputingDarmstadt, Germany, May 20th, 20111OutlineIntroductionGlobal MethodsLocal Search TechniquesMultilevel MethodsMethods based on Random Walks and DiffusionRelated and Future DirectionsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC2IntroductionGraph Partitioning in Computer Scienceand Combinatorial Scientific ComputingSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC3

Application: Numerical SimulationsNumerical simulations:Classical parallel applicationsDomain and corresponding PDEs are discretized into mesh

Task: Map mesh onto processors for efficient parallel solution of linear systems (discretized PDEs)

Partition mesh (or dual graph) such that:Load is balanced,Communication within solvers is minimized

YF-17 fighter, [www.aero.polimit.it]Crash analysis,[www.crash-analysis.com/pages/gallery.shtml]SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC4Application: VLSI Circuit Layout/Design1) Numerical simulation of semiconductors

2) Layout of chip componentsCommunication within component cheaper than between componentsFind layout that minimizes inter-component communication

Mesh from SRAM simulation,[http://www.cogenda.com/article/Genius]

Intel Atom processor, [http://photos.macnn.com/news/0912/ intelatom45nm-lg2.jpg]SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC5Application: Image SegmentationSegmentation:Find larger regions in an image with similar visual characteristicsSimplify image, preprocessing

Image modeled as a graph:Each pixel is a vertexEdges between pixels that are spatially not too far from each otherEdge weights model visual similaritySIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC6

[http://people.cs.uchicago.edu/~pff/segment/]Problem FormulationTraditional static graph partitioning problem (GPP): Given a graph , partition into by a mapping such that is balanced ( ) andthe weight of the cut edges is minimized

Dynamic case: Repartitioning problem Solve the GPP with additional objective: Minimum migration costsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC7

7Criticism and AdaptationsEdge cut does not model cost of solver communication accuratelyHypergraph partitioningSome solvers profit from good partition shapesShape optimizationConnected parts often desirableSynchronous computations: Maximum norm instead of summation normSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC8B. Hendrickson: Graph Partitioning and Parallel Solvers: Has the Emperor No Clothes? (Extended Abstract). IRREGULAR 1998: 218-225.

Complexity and Approximation ResultsGraph partitioning is NP-hard optimization problem

O(sqrt(log n)) approximation algorithm for sparsest cut, balanced separotors[Arora, Hazan, Kale, SIAM J. Comput. 2010], [Sherman, FOCS 2009]

Approximation algorithms:Rather complicated implementationNot fast enough in practice, e.g. solve many flow problemsFor practice: Guarantees are still quite far away from optimum

Heuristics in practiceSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC9

Global MethodsSpectral PartitioningGeometric ApproachesMetaheuristicsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC10

Spectral PartitioningFormulate edge-cut minimization as binary quadratic program:

Relax integral constraint, solve eigenvector problemPro: Mathematical analysis (connected parts under certain conditions), optimized eigensolversCon: Quality often not comparable to best methods, (sequential) running time higher than with local optimizersSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC11A. Pothen, H.D. Simon, K.P. Liou: Partitioning sparse matrices with Eigenvectors of graphs. SIAM J. Marix Anal. & Appl. 11 (1990), no. 3, pp. 430-452.Geometric MethodsCoordinate Nested Dissection (CND) and Recursive Inertial Bisection (RIB): Bisect with hyperplanes Space-filling curvesVery fast, low memory consumption, mostly easy to parallelizeBut: Coordinates are necessary and (more importantly) methods are not well suited to artifacts such as holes and fissuresMainly as preprocessing or for specific usesSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC12

[Schloegel et al., 2003]

Lebesgue curve12MetaheuristicsIntroduction and OverviewMetaheuristics have been applied successfully to a variety of optimization problemsGraph partitioning:Evolutionary / genetic algorithmsPopulation Reinforced Opti-mization Based ExplorationFusion and fissionSimulated annealingDrawback: (Mostly) Very time-consumingSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC13

[http://brainz.org/15-real-world-applications-genetic-algorithms/]See e.g.C. Walshaw: Multilevel Refinement for Combinatorial Optimisation: Boosting Metaheuristic Performance. Hybrid Metaheuristics 2008: 261-289.Local Search TechniquesKernighan-LinFiduccia-MattheysesHelpful SetsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC14Local Search HeuristicsOverviewKernighan-Lin (KL), Fiduccia-Mattheyses (FM)Helpful Sets (HS)Other variations

Local search methods to improve existing partitionVertex exchanges based on gain (edge cut improvement by exchange)

SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC15Gain: 315Discussion of KL/FM/HSAdvantages:Very fast (without coordinate data)Reasonably good quality in multilevel processVery popular, graph tools: Metis, Jostle, Chaco, Scotch, KaPPa, Party; hypergraph tools: hMetis, PaToH, Mondriaan, MLPart, Parkway, ZoltanDisadvantages:KL/FM/HS focus only on edge cutNot easy to parallelizeNo quality guarantees for KL/FM

SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC16

Multilevel MethodsGlobal View for Local Methods:Matchings, Weighted Aggregation, n-levelSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC17Multilevel StrategyLocal methods need reasonably good starting solutionRestrict search space, avoid cutting heavy edges

General procedure:Recursive coarseningInitial partitioningInterpolation and local improvementSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC18

Matching AlgorithmsApproximate Maximum Weighted MatchingSerial algorithms (among others):SHEM (no guar.), Greedy (-appr.) LAM (Preis, -appr.)PGA (Drake and Hougardy, -appr.)GPA, ROMA (Maue and Sanders, and )Comparison [Maue, Sanders, WEA 2007]:GPA yields better quality than Greedy and PGARandomization techniques such as ROMA often improve quality furtherScalable parallelization requires parallel matchingSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC19

Guiding Matching AlgorithmsMaximum total weight not exact model for coarseningOther aspects need to be considered (e.g. uniformity)Heuristic rationale:Contract heavy edges to decrease cut sizeContract light vertices for uniformity

Idea: Use edge rating function to guide the matching algorithm according to rationale based on local infoMWM algorithms can be reused, new edge ratings are more meaningful than edge weightsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC20Edge Ratings for MatchingsKaPPa: Scalable parallelization with MPI, no quality penalty as with other parallel KL/FM partitionersFM local search in boundary areas (BFS)Four promising edge ratings for matching:All yield significantly better partitions than edge weight onlyOne good choice:

One of the key ingredients for better qualitySequential MWM: Using edge ratings with GPA yields better partitioning quality than SHEM and GreedySIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC21

M. Holtgrewe, P. Sanders, and C. Schulz, Engineering a scalable high quality graph partitioner,in 24th Intl. Parallel and Distributed Processing Symposium (IPDPS 2010). IEEE, 2010, pp. 112.

[Chevalier and Safro, 2009]Weighted Aggregation / AMGAMG: Hierarchy-based preconditioner and solver for linear systemsIdea: AMG coarsening algorithms also suitable for MGP coarsening Weighted aggregation:Choose coarse vertex set(independent set with strong coupling)Use interpolation scheme to assign fractions of fine vertices to coarse ones15% edge cut improvement compared to matching when used with simple FM

SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC22

C. Chevalier, I. Safro: Comparison of Coarsening Schemes for Multilevel Graph Partitioning. Learning and Intelligent Optimization (LION) 2009: 191-205.n-level Graph PartitioningMain idea: Deep but simple hierarchy, simple local heuristicOnly one edge contraction between consecutive levelsEdge is chosen based on edge rating function valueKL-like local searchMain difference:Very localized, search only around uncontracted edgeLocal search stopped based on random walk model

V. Osipov and P. Sanders, n-level graph partitioning, in Proc. 18th Annual European Symposiumon Algorithms (ESA10), 2010, pp. 278289.SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC23Multilevel MethodsSummary and ConclusionsMultilevel process crucial for partitioning qualitySeveral new approaches:Edge ratings to guide approximate MWMAMG / Weighted aggregationn-level hierarchySubstantial quality improvements possible!Nice to have: Scalable parallel and publicly available implementation of all these features to have the choiceSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC24Methods based on Random Walks and DiffusionIdentifying Dense Regions with Random Walks/DiffusionSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC25Random Walks and DiffusionRandom walks:Stochastic process on graphs, starts on arbitrary vertexPick next vertex to go to from neighbors with probability proportional to edge weightLikely to stay in dense graph region when in there

Diffusion:Desire of a substance to distribute itself in spaceRelated to random walksSteady state: Balanced load on all vertices

SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC26Load Balancing by DiffusionDenote: work load of node ;

Discrete diffusion:

M doubly stochastic (random walk analogy!)L positive semidefiniteDiffusion flow is optimal with respect to the l2-norm

Lemma:SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC27

Shape Optimizing PartitioningIdea 1: Compute good partition shapes with small surfaces!Idea 2: Diffusive process decides which elements go where!

Results in:Short partition boundariesSmall partition diametersFew cut edgesConnected partitions more oftenSmall migration costs in case of repartitioningHigher, but reasonable running timeSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC28

Metis (KL)

Shape Optimizedk-means and the Bubble FrameworkBubble framework:Lloyds k-means algorithm transferred to graphsBasic idea for GP:[Walshaw et al., 1995], [Diekmann et al., 2000]

Graph distance (path length): , does not distinguish dense and sparse regionsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC29

29Good Shapes with Disturbed DiffusionRequirement for similarity measure: Reflect how well connected two vertices/regions areUse diffusion! [Schamberger, IPDPS Workshops 2004]Diffusion load spreads faster into densely connected regionsDisturbed diffusion to avoid balanced stateUse set of privileged source verticesSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC30

Disturbed Diffusion FOS/CFOS/C: First Order Scheme with Constant drain

Source set determines structure of Lemma: Diffusive iteration converges if solution can be computed by solving linear system:

(FOS/C procedure)

SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC31

Bubble Operation AssignPartitionInput: Centers. Output: Partition . For each part : Solve FOS/C procedure, center as source vertex, disturbance by drain vector Linear system for : Assignment of vertex to a part:

Two balancing procedures also use diffusion valuesSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC32H. Meyerhenke, B. Monien, S. Schamberger: Graph Partitioning and Disturbed Diffusion. Parallel Computing, 35(10-11):544-569, 2009.

Independent operations parallelismBubble Operation ComputeCentersSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC33Input: Partition .Output: Center set C.

For each part : Solve FOS/C procedure, vertices of part as source set, disturbance by drain vector

Linear system for :

New center of part :SIAM CSC, May 2011Recent Trends in Graph Partitioning for SC33

Independent for each part parallelism33Optimization CriterionQuadratic optimization problem for min balanced cut ( ):

Spectral methods: Relax integralityconstraint and solve eigenvector problemSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC34H. Meyerhenke: Beyond good shapes: Diffusion-based graph partitioning is relaxed cut optimization.In Proc. 21st International Symposium on Algorithms and Computation (ISAAC). Springer, 2010. Invited to special ISAAC 2010 issue of Algorithmica.

Theorem: Under mild conditions, AssignPartition followed by ScaleBalance together compute the global minimum of a similar relaxed optimization problem.The Bubble-FOS/C HeuristicDiscussionAdvantages:Mathematical analysis: Proven convergence, relaxed edge cut optimizationGood experimental results on FEM graphsDisadvantage:High running time (due to linear system solving)

Use simpler diffusive mechanism to retain the good properties, but accelerate the processSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC35Random Walks for ClusteringSuitable random walk length important!Distance or similarity measures based on random walks:Euclid. Commute Time Distance (ECTD) [Fouss et al., IEEE Trans. KDE 2007]Algebraic distances [Chen and Safro, 2010]Diffusion distances [Lafon and Lee, PAMI 2006]Other clustering methods with similar ideas:Markov Clustering [van Dongen, 2000]Clustering spatial data using random walks [Harel and Koren, KDD 2001]Isoperimetric graph partitioning [Grady and Schwartz, SIAM J. SC, 2006]SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC36Faster Local Diffusive ApproachTruncCons, k-way extension and variant of [Pellegrini, Euro-Par 2007]Consolidation: Same initial load for nodes of current subdomain, all others 0.Stop/Truncate FOS after very few iterations.Rationale: Local improvement, load flows from the subdomain borders into the graph.Computational work can be restricted to area near to the part boundariesRepeat process with newly computed partitionSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC37

H. Meyerhenke, B. Monien, T. Sauerwald: A New Diffusion-based Multilevel Algorithm for Computing Graph Partitions of Very High Quality. In Proc. 22nd IEEE Internatl. Parallel and Distributed Processing Symposium (IPDPS'08). Winner of the Best Algorithms Paper Award. 37DibaP: Diffusion-based PartitioningHybrid Multilevel AlgorithmDibaP: Hybrid algorithm, Multilevel+ Bubble-FOS/C + TruncCons

1) + 2): CoarseningApprox. maximum weighted matchingAlgebraic multigrid (AMG)3) Initial partitioning: Bubble-FOS/C4) + 5): Local improvementSmall hierarchy levels: Bubble-FOS/CLarge hierarchy levels: TruncConsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC38

DibaPExperimental Results (1): PartitioningWalshaws archive traces best partitions for 34 benchmark graphs (24 entries per graph):At the time more than 80 records, now 16 leftSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC39Average results for 8 benchmark graphs (sum norm)Maximum norm: Even slightly higher improve-ment with DibaP

DibaPExperimental Results (2): Repartitioning [M., M., SIAM CSE 2011]Repartitioning of 2D synthetic dynamic graph sequencesMPI parallel implementation pDibaPRunning time ca. 35x slower than ParMetisCa. 15-30% higher qualitySIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC40

Diffusive Graph PartitioningSummary and OutlookGood solution qualityHigh, but acceptable running timeTheoretical foundationEspecially suitable for repartitioning

Further acceleration desirable:Combination with other techniques, faster solversFaster implementations, tailored to parallel hardware

Adaptation to different scenario:Clustering of P2P networks [Gehweiler and Meyerhenke, HPGC10]SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC41Related and Future DirectionsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC42What could not be coveredHypergraph partitioning: Models communication better in many cases; so far mostly used with KL/FMCan the new techniques developed for graph partitioning applied to hypergraphs as well?Flow-based algorithms: MQI [Lang and Rao, IPCO 2004], KaFFPa [Sanders and Schulz, TR 2011] exploit max-flow min-cut, related implementations: [Lang, Mahoney, Orecchia, SEA 2009]Theoretical work on local partitioning with random walks [Anderson, Chung, Lang, FOCS 2006], [Andersen, Peres, STOC 2009]Resource awareness [Walshaw, Cross, FGCS 2001], [Moulitsas and Karypis, ICA3PP 2008]Semi-definite programming and other optim. techniquesPractical methods for other apps (road networks: [Delling et al., TR 2010])SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC43

Massive (and hierarchical) parallelism

Heterogeneity:ArchitecturesWorkloadsArchit. MappingFuture Directions in GPTransfer new techniques to hypergraphsDynamic graphsBetter theoretical understanding of new techniquesSocial networks:Power law degree distributionDynamicsSIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC44[http://www.nvidia.com]

[prblog.typepad.com]10th DIMACS Implementation ChallengeGraph Partitioning and Graph ClusteringCapture the state-of-the-art inGraph partitioningGraph clusteringParticipation:Provide dataSubmit solversDevelopment phasehas startedMore info: http://www.cc.gatech.edu/dimacs10/SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC45

Thank you!SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC46Acknowledgments: This work was partially supported by German Research Foundation (DFG) Priority Programme 1307 Algorithm Engineering, by DARPA Ubiquitous High Performance Computing (UHPC), and by the CASS-MT Center of Pacific Northwest National Laboratory (PNNL).