partitioning using mesh adjacencies graph-based dynamic balancing parallel construction and...

9
Adjacencies Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes reasonable time In the case of unstructured meshes, a graph node is represented as a mesh region, mesh adjacencies define edges Mesh adjacencies are a more complete representation then a standard partition graph All mesh entities can be considered (graph has to decide what defines graph nodes, information on the adjacencies that define the graph edges lost) Any adjacency obtained in O(1) time, as apposed to having to construct multiple graphs (assuming use of a complete mesh adjacency structure) Possible advantages Avoid graph construction (assuming you have needed adjacencies) Account for multiple entity types – important for the solve process - typically the most computationally expensive step Easy to use with diffusive procedures, but not ideal for “global” balancing Disadvantage Lack of well developed algorithms for parallel partitioning operations directly from mesh adjacencies

Upload: harriet-gray

Post on 21-Jan-2016

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

Partitioning using Mesh Adjacencies

Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

reasonable time In the case of unstructured meshes, a graph node is represented as a mesh region,

mesh adjacencies define edges Mesh adjacencies are a more complete representation then a standard partition graph All mesh entities can be considered (graph has to decide what defines graph nodes,

information on the adjacencies that define the graph edges lost) Any adjacency obtained in O(1) time, as apposed to having to construct multiple

graphs (assuming use of a complete mesh adjacency structure) Possible advantages

Avoid graph construction (assuming you have needed adjacencies) Account for multiple entity types – important for the solve process - typically the most

computationally expensive step Easy to use with diffusive procedures, but not ideal for “global” balancing

Disadvantage Lack of well developed algorithms for parallel partitioning operations directly from

mesh adjacencies

Page 2: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

ParMA: Partition Improvement

Improve scaling of applications by reducing imbalances through exchange of mesh regions between neighboring parts Current algorithm focused on improved scalability of the

solve by accounting for balance of multiple entity typesImbalance is limited to a small number of heavily loaded parts,

referred to as spikes, which limit the scalability of applicationsExample: Reduce the small number of entity imbalance spikes at the

cost of an increase in imbalance in regions which was the entity used as the nodes in the standard graph

Similar approaches can be used to:Improve balance when using multiple parts per process - may be as

good as full rebalance for lower total costImprove balance during mesh adaptation – likely want extensions past

simple diffusive methods

Page 3: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

Example of C0, linear shape function finite elements Assembly sensitive to mesh element imbalances Solve sensitive to mesh vertex imbalances since vertices

hold the dof – dominant computation Heaviest loaded part dictates solver performance

Element-based partitioning results in spikes of dofs

ParMA: Application Requirements

element imbalance increased from 2.64% to 4.54%dof imbalance reduced from 14.7% to 4.92%

Page 4: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

ParMA: AlgorithmInput:

Types of mesh entities need to be balanced (Rgn, Face, Edge, Vtx) The relative importance (priority) between them (= or >) The balance of entities not specified in the input are not explicitly

improved or preserved Mesh with complete representation and communication,

computation and migration weights for each entity

Algorithm: From high to low priority if separated by “>” (different groups)

From low to high dimensions based on entities topologies if separated by “=” (same group) Compute migration schedule Select regions for migration and migrate

e.g., “Rgn>Face=Edge>Vtx” is the user’s input Step 1: improve balance for mesh regionsStep 2.1: improve balance for mesh edgesStep 2.2: improve balance for mesh facesStep 3: improve balance for mesh vertices

Page 5: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

ParMA: Application Defined Partition Criteria

Application defined priority list of entity types such that imbalance of high priority types is not increased when balancing lower priority types Satisfying multiple constraints simultaneously is difficult as

more are added Multi-constraint graph based partitioning methods balance all

constraints equally [Karypis1999, Karypis2003, Aykanat2008] Constraint priorities give flexibility to element migration and

selection procedures that can result in increased partition quality

Quantify balance requirements with application defined weights on mesh entities communication, computation, and data migration

Page 6: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

ParMA: Migration Schedule

Coordination needed to migrate elements between parts without ‘stepping on toes’ Ex) Consider three adjacent parts, two of which are heavily loaded, the other

lightly. The two heavily loaded parts migrate elements to the lightly loaded part making it heavily loaded.

Migrate computational load to the correct part Multilevel graph schemes create several partitions before converging to the final partition

– the mesh element migration cost only paid once to create the final partition Apply Hu and Blake’s diffusive solution algorithm to determine low migration

cost migration schedule that balances computational load for a given mesh entity type. [HuBlake]

- Green parts are overweight by 10 - White parts are underweight by 10 - Yellow parts have average weights - The diffusive solution is noted on each edge

Figure 1. Diffusive Solution [Dongarra2002]

Page 7: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

ParMA: Region Selection

Vertex:The vertices on inter-part boundaries bounding a small number of regions on source part P0; tips of ‘spikes’Edge: The edges on inter-part boundaries bounding a small number of faces; ‘ridge’ edges with (a) 2 bounding faces, and (b) 3 bounding faces on source part P0Face/Region: Regions which have two or three faces on inter-part boundaries; (a) ‘spike’ region (b) region on a ‘ridge’

Apply KL/FM like greedy heuristic to measure the relative change, or gain, in communication cost if a given mesh element is migrated

Migrate regions that have large ratio of computational cost to migration cost – high ‘bang for the buck’

Page 8: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

ParMA: Strong Scaling – 1B Mesh up to 160k Cores

AAA 1B elements: effective partitioning at extreme scale with and without ParMA (uniform weights, iterative migration using simple schedule)

Full system

Without ParMA with ParMA

PModPMod

(see graph)

Page 9: Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes

ParMA: Tests

133M region mesh on 16k parts

Table 1: Users input

Table 2:Balance of partitions

Table 3: Time usage and iterations (tests on Jaguar Cray XT5 system)