an evaluation of partitioners for parallel samr applications

An Evaluation of Partitioners for Parallel SAMR Applications

Sumir Chandra & Manish ParasharECE Dept., Rutgers University

Submitted to:Euro-Par 2001 : European Conference on Parallel Computing

Introduction AMR – Adaptive Mesh Refinement AMR used for solving PDEs for dynamic

applications Challenges involved:

Dynamic resource allocation Dynamic data distribution and load balancing Communication and co-ordination

Partitioning of adaptive grid hierarchy Evaluation of dynamic domain-based

partitioning strategies with an application-centric approach

Motivation & Goal Even for a single application, the most suitable

partitioning technique depends on input parameters and its run-time state

Application-centric characterization of partitioners as a function of number of processors, problem size, and granularity

Enable the run-time selection of partitioners based on input parameters and application state

Adaptive Mesh Refinement

•Start with a base coarse grid with minimum acceptable resolution

• Tag regions in the domain requiring additional resolution, cluster the tagged cells, and fit finer grids over these clusters

• Proceed recursively so that regions on the finer grid requiring more resolution are similarly tagged and even finer grids are overlaid on these regions

• Resulting grid structure is a dynamic adaptive grid hierarchy

The Berger-Oliger AlgorithmRecursive Procedure Integrate(level)

If (RegridTime) Regrid Step t on all grids at level “level”

If (level + 1 exists)Integrate (level + 1) Update(level, level + 1)

End ifEnd Recursionlevel = 0Integrate(level)

Partitioning Adaptive Grid Hierarchies

SAMR 2-D Grid HierarchyTime Step 40 Time Step 80Time Step 0

Time Step 160Time Step 120 Time Step 182

Level 1:Level 0: Level 3:Level 2: Level 4:Legend

Partitioning Techniques Static or Dynamic techniques Geometric or Non-geometric

Dynamic partitioning – global or local approaches

Partitioners for SAMR grid applications Patch-based Domain-based Hybrid

Partitioners Evaluated SFC: Space Filling Curve based partitioning G-MISP: Geometric Multi-level Inverse Space filling

curve Partitioning G-MISP+SP: Geometric Multi-level Inverse Space

filling curve Partitioning with Sequence Partitioning pBD-ISP: p-way Binary Dissection Inverse Space

filling curve Partitioning SP-ISP: “Pure” Sequence Partitioning with Inverse

Space filling curve Partitioning WD: Wavefront Diffusion based on global work load

SFC Recursive linear representation of multi-dimensional

grid hierarchy using space-filling mappings (N-to-1D mapping)

Computational load determined by segment length and recursion level

G-MISP & G-MISP+SPG-MISP Multi-level algorithm views matrix

of workloads from SAMR grid hierarchy as a one-vertex graph, refined recursively

Speed at expense of load balanceG-MISP+SP “Smarter” variant of G-MISP – uses

sequence partitioning to assign consecutive portions of one-dimensional list to processors

Load balance improves but scheme is computationally more expensive

pBD-ISP Generalization of binary dissection – domain

partitioned into p partitions

Each split divides load as evenly as possible, considering processors

SP-ISP Domain sub-divided into p*b equally sized blocks Dual-level algorithm - parameter settings for each level

Fine granularity scheme: good load balance but increased overhead, communication and computational cost

WD Part of ParMetis suite based on global workload Used for repartitioning graphs with scattered refinements Results in fine grain partitionings with jagged boundaries

and increased communication costs and overheads Metis integration extremely expensive, dedicated SAMR

partitioners performed much better Two extra steps needed for Metis in our interface Metis graph generated from grid before partitioning,

clustering used to regenerate grid blocks from graph partitions after partitioning

Experimental Setup Application – RM3D

3-D “real world” compressible turbulence application solving Richtmyer-Meshkov instability

Fingering instability which occurs at a material interface accelerated by a shock wave

Machine – NPACI IBM SP2 Blue Horizon at SDSC Teraflop-scale Power3 based SMP cluster 1152 processors and 512GB of main memory AIX operating system Peak bi-directional data transfer rate of approx. 115

Experimental Setup (contd.) Base coarse grid – 128 * 32 * 32 3 levels of factor 2 space-time refinements Application ran for 150 coarse level time-steps

Experiments consisted of varying – Partitioner (from the set of evaluated partitioners) Number of processors (16 – 128) Granularity, i.e. the atomic unit (2*2*2 – 8*8*8)

Metrics used – total run-time, maximum load imbalance, AMR efficiency

Experimental Results

RM3D application on 16 processors with granularity 2

Partitioner Run-time (s)

Max. Load Imbalance (%)

AMR Efficiency

SFC 3315.22 1.629 72.388

G-MISP 2931.08 55.431 77.745

G-MISP+SP

2805.54 5.834 77.851

pBD-ISP 2601.05 28.498 83.169

SP-ISP 3136.32 204.548 82.207

Run-times

Max. Load Imbalance

AMR Efficiency

Experimental Evaluation RM3D needs rapid refinement and efficient

redistribution

pBD-ISP, G-MISP+SP, SFC best suited for RM3D – fast partitioners with low imbalance and maintaining good communication patterns

pBD-ISP fastest, but average load imbalance

G-MISP+SP and SFC generate lowest imbalance but are relatively slower

Evaluated partitioning techniques scale reasonably well

Evaluation (contd.) Coarse granularity produces high load imbalance

Fine granularity leads to greater synchronization and coordination overheads and higher execution times

Optimal partitioning granularity requires a trade-off between execution speed and load imbalance

For RM3D application, granularity of 4 gives lowest execution time with acceptable load imbalance

Conclusions Experimental evaluation of dynamic domain-

based partitioning and load-balancing techniques

RM3D compressible turbulence application

Effect of choice of partitioner and granularity on execution time

Formulation of application-centric characterization of the partitioners as a function of number of processors, problem size, and partitioning granularity

an evaluation of partitioners for parallel samr applications

level level

grid structure

samr grid hierarchy

existsintegrate level

base coarse grid

spacefilling mappings

partitioning strategies

pure sequence partitioning

Documents

samr lesson design presentation

task mapping of parallel applications using graph...

the samr ladder

ipads with samr

samr & hunni blogs

using samr & tpack

teaching professor samr presentation

samr in practice

samr scenarios

autonomic dynamic load balancing of parallel samr...

samr raul saldana

tpck och samr

samr workshop

understanding samr

la joya isd – samr monthly guide - amazon s3 ·...

tpck & samr

characterization of domain-based partitioners for parallel...

samr flow chart

tools samr

study on the attitude of medical partitioners toward