an evaluation of partitioners for parallel samr applications
DESCRIPTION
An Evaluation of Partitioners for Parallel SAMR Applications. Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001 : European Conference on Parallel Computing. Introduction. AMR – Adaptive Mesh Refinement - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/1.jpg)
An Evaluation of Partitioners for Parallel SAMR Applications
Sumir Chandra & Manish ParasharECE Dept., Rutgers University
Submitted to:Euro-Par 2001 : European Conference on Parallel Computing
![Page 2: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/2.jpg)
Introduction AMR – Adaptive Mesh Refinement AMR used for solving PDEs for dynamic
applications Challenges involved:
Dynamic resource allocation Dynamic data distribution and load balancing Communication and co-ordination
Partitioning of adaptive grid hierarchy Evaluation of dynamic domain-based
partitioning strategies with an application-centric approach
![Page 3: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/3.jpg)
Motivation & Goal Even for a single application, the most suitable
partitioning technique depends on input parameters and its run-time state
Application-centric characterization of partitioners as a function of number of processors, problem size, and granularity
Enable the run-time selection of partitioners based on input parameters and application state
![Page 4: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/4.jpg)
Adaptive Mesh Refinement
•Start with a base coarse grid with minimum acceptable resolution
• Tag regions in the domain requiring additional resolution, cluster the tagged cells, and fit finer grids over these clusters
• Proceed recursively so that regions on the finer grid requiring more resolution are similarly tagged and even finer grids are overlaid on these regions
• Resulting grid structure is a dynamic adaptive grid hierarchy
The Berger-Oliger AlgorithmRecursive Procedure Integrate(level)
If (RegridTime) Regrid Step t on all grids at level “level”
If (level + 1 exists)Integrate (level + 1) Update(level, level + 1)
End ifEnd Recursionlevel = 0Integrate(level)
Partitioning Adaptive Grid Hierarchies
![Page 5: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/5.jpg)
SAMR 2-D Grid HierarchyTime Step 40 Time Step 80Time Step 0
Time Step 160Time Step 120 Time Step 182
Level 1:Level 0: Level 3:Level 2: Level 4:Legend
![Page 6: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/6.jpg)
Partitioning Techniques Static or Dynamic techniques Geometric or Non-geometric
Dynamic partitioning – global or local approaches
Partitioners for SAMR grid applications Patch-based Domain-based Hybrid
![Page 7: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/7.jpg)
Partitioners Evaluated SFC: Space Filling Curve based partitioning G-MISP: Geometric Multi-level Inverse Space filling
curve Partitioning G-MISP+SP: Geometric Multi-level Inverse Space
filling curve Partitioning with Sequence Partitioning pBD-ISP: p-way Binary Dissection Inverse Space
filling curve Partitioning SP-ISP: “Pure” Sequence Partitioning with Inverse
Space filling curve Partitioning WD: Wavefront Diffusion based on global work load
![Page 8: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/8.jpg)
SFC Recursive linear representation of multi-dimensional
grid hierarchy using space-filling mappings (N-to-1D mapping)
Computational load determined by segment length and recursion level
![Page 9: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/9.jpg)
G-MISP & G-MISP+SPG-MISP Multi-level algorithm views matrix
of workloads from SAMR grid hierarchy as a one-vertex graph, refined recursively
Speed at expense of load balanceG-MISP+SP “Smarter” variant of G-MISP – uses
sequence partitioning to assign consecutive portions of one-dimensional list to processors
Load balance improves but scheme is computationally more expensive
![Page 10: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/10.jpg)
pBD-ISP Generalization of binary dissection – domain
partitioned into p partitions
Each split divides load as evenly as possible, considering processors
![Page 11: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/11.jpg)
SP-ISP Domain sub-divided into p*b equally sized blocks Dual-level algorithm - parameter settings for each level
Fine granularity scheme: good load balance but increased overhead, communication and computational cost
![Page 12: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/12.jpg)
WD Part of ParMetis suite based on global workload Used for repartitioning graphs with scattered refinements Results in fine grain partitionings with jagged boundaries
and increased communication costs and overheads Metis integration extremely expensive, dedicated SAMR
partitioners performed much better Two extra steps needed for Metis in our interface Metis graph generated from grid before partitioning,
clustering used to regenerate grid blocks from graph partitions after partitioning
![Page 13: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/13.jpg)
Experimental Setup Application – RM3D
3-D “real world” compressible turbulence application solving Richtmyer-Meshkov instability
Fingering instability which occurs at a material interface accelerated by a shock wave
Machine – NPACI IBM SP2 Blue Horizon at SDSC Teraflop-scale Power3 based SMP cluster 1152 processors and 512GB of main memory AIX operating system Peak bi-directional data transfer rate of approx. 115
MBps
![Page 14: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/14.jpg)
Experimental Setup (contd.) Base coarse grid – 128 * 32 * 32 3 levels of factor 2 space-time refinements Application ran for 150 coarse level time-steps
Experiments consisted of varying – Partitioner (from the set of evaluated partitioners) Number of processors (16 – 128) Granularity, i.e. the atomic unit (2*2*2 – 8*8*8)
Metrics used – total run-time, maximum load imbalance, AMR efficiency
![Page 15: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/15.jpg)
Experimental Results
RM3D application on 16 processors with granularity 2
Partitioner Run-time (s)
Max. Load Imbalance (%)
AMR Efficiency
(%)
SFC 3315.22 1.629 72.388
G-MISP 2931.08 55.431 77.745
G-MISP+SP
2805.54 5.834 77.851
pBD-ISP 2601.05 28.498 83.169
SP-ISP 3136.32 204.548 82.207
![Page 16: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/16.jpg)
Run-times
![Page 17: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/17.jpg)
Max. Load Imbalance
![Page 18: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/18.jpg)
AMR Efficiency
![Page 19: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/19.jpg)
Experimental Evaluation RM3D needs rapid refinement and efficient
redistribution
pBD-ISP, G-MISP+SP, SFC best suited for RM3D – fast partitioners with low imbalance and maintaining good communication patterns
pBD-ISP fastest, but average load imbalance
G-MISP+SP and SFC generate lowest imbalance but are relatively slower
Evaluated partitioning techniques scale reasonably well
![Page 20: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/20.jpg)
Evaluation (contd.) Coarse granularity produces high load imbalance
Fine granularity leads to greater synchronization and coordination overheads and higher execution times
Optimal partitioning granularity requires a trade-off between execution speed and load imbalance
For RM3D application, granularity of 4 gives lowest execution time with acceptable load imbalance
![Page 21: An Evaluation of Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813f52550346895daa0ec2/html5/thumbnails/21.jpg)
Conclusions Experimental evaluation of dynamic domain-
based partitioning and load-balancing techniques
RM3D compressible turbulence application
Effect of choice of partitioner and granularity on execution time
Formulation of application-centric characterization of the partitioners as a function of number of processors, problem size, and partitioning granularity