![Page 1: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/1.jpg)
Characterization of Domain-Based Partitioners
for Parallel SAMR Applications
Johan Steensland Sumir Chandra Michael Thuné Manish Parashar
IT, Dept. of Scientific Computing Dept. of Electrical & Computer Engg.Uppsala University Rutgers, The State University of NJUppsala, Sweden Piscataway, NJ, USA
This research was supported by the National Science Foundation and Swedish Foundation for Strategic Research
![Page 2: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/2.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Overview
• Structured AMR• Partitioning Adaptive Grid Hierarchies• Grid Structures• Characterizing Partitioning Schemes• SAMR Applications• Partitioning Techniques• Experimental Evaluation• Partitioner Performance• Octant Approach• Partitioning Prescriptions• Towards ARMaDA• Conclusions
![Page 3: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/3.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Adaptive Mesh Refinement
•Start with a base coarse grid with minimum acceptable resolution
• Tag regions in the domain requiring additional resolution, cluster the tagged cells, and fit finer grids over these clusters
• Proceed recursively so that regions on the finer grid requiring more resolution are similarly tagged and even finer grids are overlaid on these regions
• Resulting grid structure is a dynamic adaptive grid hierarchy
The Berger-Oliger AlgorithmRecursive Procedure Integrate(level)
If (RegridTime) Regrid Step t on all grids at level “level”
If (level + 1 exists)Integrate (level + 1) Update(level, level + 1)
End ifEnd Recursionlevel = 0Integrate(level)
Structured AMR
![Page 4: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/4.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioning Adaptive Grid Hierarchies
• Balance load and…– Expose available parallelism– Minimize communication overheads
• Inter-level prolongations/restrictions• Intra-level “ghost” communications
– Enable dynamic load redistribution with minimum overheads
• Parallel AMR costs– Communications
• intra-level “ghost” communication– along the surface of each block
• inter-level prolongation/restriction communications– gather/scatter between parents/children
– Grid recomposition• grid refinement/coarsening• redistribution and load-balancing• prolongation• data-movement
![Page 5: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/5.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Grid Structures
Time Step 40 Time Step 80Time Step 0
Time Step 160Time Step 120 Time Step 182
Level 1:Level 0: Level 3:Level 2: Level 4:Legend
• 2-D Grid Structure
![Page 6: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/6.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Grid Structures (contd.)
• 3-D Grid Hierarchy
![Page 7: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/7.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Characterizing Partitioning Schemes
• PAC Tuple– Run-time selection of partitioning
schemes based on system/ application parameters
• Evaluation Metrics– Communication Requirement
• inter-level/intra-level communication & memory copies
– Load Imbalance• amount of imbalance • effort required
– Data Migration• consider existing distribution
– Partitioning Time– Partitioning Induced Overhead
• number of grid components• quality of grid components
– size, aspect ratio
• Overview of Distribution Schemes– Space-Filling Curves (SFC)– Sequence Partitioning (SP)– Multi-level Inverse SFC (Vampire)
• Geometric, binary dissection, parameterized binary dissection
– Binary Dissection (BD)– Wavefront Diffusion (WD - ParMetis)– Iterative Tree Balancing (ITB)– Combined Grid Distribution (CGD)– Independent Grid Distribution (IGD)– Independent Level Distribution (ILD)– Weighted Distribution
![Page 8: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/8.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
SAMR Applications
• Suite of 5 real-world SAMR application kernels• Scientific and engineering domains
– Numerical relativity: scalarwave 2-D & 3-D
– Oil reservoir simulations: Buckley-Leverette 2-D & 3-D
– Computational fluid dynamics:• Compressible turbulence: rm 2-D• Supersonic flows: enoamr 2-D
– Transport equation: Transport 2-D
• Applications use 3 levels of factor 2 refinements• Refinements performed every 4 time-steps• Applications executed for 100 time-steps
![Page 9: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/9.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioning Techniques
• SFC (ISP)– Recursive linear representation of multi-dimensional grid hierarchy
generated using space-filling mappings (N-to-1 dimensional mapping)
– Computational load determined by segment length and recursion level
• G-MISP– Multi-level algorithm viewing matrix of workloads from SAMR grid
hierarchy as a one-vertex graph, refined recursively
– Favors speed at expense of load balance
• G-MISP + SP– “Smarter” variant of G-MISP – uses sequence partitioning to assign
consecutive portions of one-dimensional list to processors
– Load balance improves but scheme is computationally more expensive
![Page 10: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/10.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioning Techniques (contd.)
• pBD-ISP– Generalization of binary dissection – domain partitioned into p partitions
– Each split divides load as evenly as possible, considering processors
• SP– Domain sub-divided into p*b equally sized blocks
– Dual-level algorithm enabling different parameter settings for each level
– Fine granularity scheme – good load balance but increased overhead, communication and computational cost
• WD– Part of ParMetis suite based on global workload and specializes in
repartitioning graphs where refinements are scattered
– Scheme results in fine grain partitionings with jagged boundaries and increased communication costs and overheads
![Page 11: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/11.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Experimental Evaluation
• Normalized results for Scalarwave and Buckley-Leverette applications
![Page 12: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/12.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Experimental Evaluation (contd.)
Scheme LB Comm. DM OH Speed
G-MISP - o - - - - - o o o o o o o o o o o o o o o o o o o o o +
GMISP+SP + + + o + o + o o o o o o o o o o o o o o o o o o o o o o
pBD-ISP o o o - o - - + + + o + o o + o + o + o + + + + + + + + +
SP + + - - o - o - o o + o + o o o o + - - o - o - - o - o -
ISP - - o - - - - - - o - o - - - - o o - o o o + + + + o + +
(+) - Significantly better (o) - Average (-) - Significantly worse
![Page 13: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/13.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioner Performance
Performance summary of observed results
• G-MISP– Fast, load balance not optimized, average overall performance
• G-MISP+SP– Similar to G-MISP, better load balance, higher computational costs
• pBD-ISP– Good overall performance, very fast, small communications and data
movement, average load balance
• SP– Computationally very expensive, unpredictable behavior, worse load
balance than G-MISP+SP
![Page 14: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/14.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioner Performance (contd.)
• ISP– Very fast, generates low overhead, below average load balance, higher
communication, similar to those of G-MISP
• WD– Metis integration extremely expensive, dedicated SAMR partitioners
performed much better
– Even though Metis is known to produce high-quality partitionings at a low cost, two extra steps were needed in our interface
– Metis graph generated from grid before partitioning, clustering used to regenerate grid blocks from graph partitions after partitioning
![Page 15: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/15.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Octant Approach
• Used to classify the state of the SAMR application with respect to– Adaptation pattern (scattered or localized)
– Whether run-time is dominated by computation or communication
– Activity dynamics in the solution
![Page 16: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/16.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Partitioning Prescriptions
• Association of partitioning techniques to application state octants
Octant Scheme
I pBD-ISP
II pBD-ISP
III G-MISP+SP
IV G-MISP+SP, SP
V pBD-ISP
VI pBD-ISP
VII G-MISP+SP
VIII G-MISP+SP
![Page 17: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/17.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Towards ARMaDA
• ARMaDA – Adaptive Runtime Management of Dynamic Applications
• “Best” partitioning depends on application/system configuration and current application/system state– Application Sensitive Adaptation
• Partitioning Scheme: Vampire (MISP), GrACE (SFC), ParMetis (WD), RSB, ITB, etc.
• Granularity: Patch size: AMR efficiency, comm./comp. ratio, overhead, node-performance, load-balance, etc.
• Number of Processors/ Load on Processors: Dynamic allocations/ configuration/ management (1000+ processors from the beginning or “on-demand”, hierarchical decomposition using dynamic processor groups)
– System Sensitive Adaptation• Availability of system resources• State of system resources: SNMP, NWS, REMOS
• Heterogeneity
![Page 18: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/18.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Towards ARMaDA (contd.)
• Adaptive meta-partitioner– Dynamic PAC tuple
![Page 19: Characterization of Domain-Based Partitioners for Parallel SAMR Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081513/5681500a550346895dbde30e/html5/thumbnails/19.jpg)
Characterization of Domain-based Partitioners for Parallel SAMR Applications
Conclusions
• Application-centric characterization of domain-based partitioners– Partitioning quality determined by a 5-component metric
– 6 partitioning schemes evaluated using 5 application kernels
– Mapping of partitioners onto application state octants
– Octant approach and dynamic PAC tuple
• Overall goal– Support the formulation of policies required to drive a dynamically
adaptive meta-partitioner for SAMR grid hierarchies
– Selection of most appropriate partitioning strategy at run-time, based on current application and system state
– Decrease in overall execution time
– ARMaDA : Adaptive Run-time Management of Dynamic Applications