flexible, scalable mesh and data management using petsc dmplex€¦ · i mesh management...

25
Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. Lange 1 M. Knepley 2 L. Mitchell 3 G. Gorman 1 1 AMCG, Imperial College London 2 Computation Institute, University of Chicago 3 Computing, Imperial College London June 16, 2015 M. Lange, M. Knepley, L. Mitchell, G. Gorman DMPlex Mesh Management

Upload: others

Post on 28-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Flexible, Scalable Mesh and Data Managementusing PETSc DMPlex

M. Lange1 M. Knepley2 L. Mitchell3 G. Gorman1

1AMCG, Imperial College London2Computation Institute, University of Chicago

3Computing, Imperial College London

June 16, 2015

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 2: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Motivation

Unstructured Mesh Management

Parallel Mesh Distribution

Firedrake

Summary

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 3: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

MotivationMesh management

I Many tasks are common across applications:Mesh input, partitioning, checkpointing, . . .

I File I/O can become severe bottleneck!

Mesh file formatsI Range of mesh generators and formats

Gmsh, Cubit, Triangle, ExodusII, Fluent, CGNS, . . .

I No universally accepted formatI Applications often “roll their own”I No interoperability between codes

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 4: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

MotivationInteroperability and extensibility

I Abstract mesh topology interfaceI Provided by a widely used library1

I Extensible support for multiple formatsI Single point for extension and optimisationI Many applications inherit capabilities

I Mesh management optimisationsI Scalable read/write routinesI Parallel partitioning and load-balancingI Mesh renumbering techniquesI Unstructured mesh adaptivity

Finding the right level of abstraction

1J. Brown, M. Knepley, and B. Smith. Run-time extensibility and librarization of simulation software. IEEE Computing in Science andEngineering, 2015

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 5: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Motivation

Unstructured Mesh Management

Parallel Mesh Distribution

Firedrake

Summary

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 6: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Unstructured Mesh Management

DMPlex - Unstructured mesh API1

I Abstract mesh connectivityI Directed Acyclic Graph (DAG)2

I Dimensionless accessI Topology separate from discretisationI Preallocation and preconditioners

I PetscSectionI Describes irregular data arrays (CSR)I Mapping DAG points to DoFs

I PetscSF: Star Forest3

I One-sided description of shared dataI Performs sparse data communication

2 3

4

1

9

1412

1110

13

0

5 6 7 8

9 10 11 12 13 14

1 2 3 4

1M. Knepley and D. Karpeev. Mesh Algorithms for PDE with Sieve I: Mesh Distribution. Sci. Program., 17(3):215–230, August 2009

2A. Logg. Efficient representation of computational meshes. Int. Journal of Computational Science and Engineering, 4:283–295, 2009

3Jed Brown. Star forests as a parallel communication model, 2011

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 7: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Unstructured Mesh Management

DMPlex - PETSc’s unstructured mesh API1

I Input: ExodusII, Gmsh, CGNS,Fluent-CAS, MED, . . .

I Output: HDF5 + XdmfI Visualizable checkpoints

I Parallel distributionI Partitioners: Chaco, Metis/ParMetisI Automated halo exchange via PetscSF

I Mesh renumberingI Reverse Cuthill-McGee (RCM)

2 3

4

1

9

1412

1110

13

0

5 6 7 8

9 10 11 12 13 14

1 2 3 4

1M. Knepley and D. Karpeev. Mesh Algorithms for PDE with Sieve I: Mesh Distribution. Sci. Program., 17(3):215–230, August 2009

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 8: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Motivation

Unstructured Mesh Management

Parallel Mesh Distribution

Firedrake

Summary

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 9: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Parallel Mesh Distribution

DMPlexDistribute1

I Mesh partitioningI Topology-based partitioningI Metis/ParMetis, Chaco

I Mesh and data migrationI One-to-all and all-to-allI Data migration via SF

I Parallel overlap computationI Generic N-level point overlapI FVM and FEM adjacency

Partition 0 Partition 1

1M. Knepley, M. Lange, and G. Gorman. Unstructured overlapping mesh distribution in parallel. Submitted to ACM TOMS, 2015

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 10: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Parallel Mesh Distribution

def DMPlexDistribute(dm , overlap):

# Derive migration pattern from partitionDMLabel partition = PetscPartition(partitioner , dm)PetscSF migration = PartitionLabelCreateSF(dm, partition)

# Initial non -overlapping migrationDM dmParallel = DMPlexMigrate(dm , migration)PetscSF shared = DMPlexCreatePointSF(dmParallel , migration)

# Parallel overlap generationDMLabel overlap = DMPlexCreateOverlap(dmParallel , N, shared)PetscSF migration = PartitionLabelCreateSF(dm, overlap)DM dmOverlap = DMPlexMigrate(dm , migration)

I Two-phase distribution enables parallel overlap generation

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 11: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Parallel Mesh DistributionStrong scaling performance:

I Cray XE30 with 4920 nodes; 2× 12-core E5-2697 @2.7GHz1

I 1283 unit cube with ≈ 12 mio. cells

One-to-all distribution

2 6 12 24 48 96Number of processors

100

101

Tim

e [

sec]

Distribute

Overlap

Distribute: Partition

Distribute: Migration

Overlap: Partition

Overlap: Migration

All-to-all redistribution

2 6 12 24 48 96Number of processors

100

101

Tim

e [

sec]

Redistribute

Redistribute: Partition

Redistribute: Migration

I Remaining bottleneck: Sequential partitioning

1ARCHER: www.archer.ac.uk

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 12: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Parallel Mesh DistributionRegular parallel refinement:

I Distribute coarse mesh and refine in parallelI Generate overlap on resulting fine mesh

2 6 12 24 48 96Number of processors

10-1

100

101

102

tim

e [

sec]

Overlap, m 128, refine 0

Distribute, m 128, refine 0

Generate, m 128, refine 0

Refine, m 128, refine 0

Overlap, m 64, refine 1

Distribute, m 64, refine 1

Generate, m 64, refine 1

Refine, m 64, refine 1

Overlap, m 32, refine 2

Distribute, m 32, refine 2

Generate, m 32, refine 2

Refine, m 32, refine 2

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 13: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Motivation

Unstructured Mesh Management

Parallel Mesh Distribution

Firedrake

Summary

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 14: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake

Firedrake - Automated FiniteElement computation1

I Re-envision FEniCS2

φn+1/2 = φn ´ ∆t2

pn

pn+1 = pn +

ż

Ω∇φn+1/2 ¨∇v dx

ż

Ωv dx

@v P V

φn+1 = φn+1/2 ´ ∆t2

pn+1

where

∇φ ¨ n = 0 on ΓN

p = sin(10πt) on ΓD

from firedrake import *mesh = Mesh("wave_tank.msh")V = FunctionSpace(mesh , ’Lagrange ’, 1)p = Function(V, name="p")phi = Function(V, name="phi")u = TrialFunction(V)v = TestFunction(V)p_in = Constant (0.0)bc = DirichletBC(V, p_in , 1)T = 10.dt = 0.001t = 0while t <= T:

p_in.assign(sin(2*pi*5*t))phi -= dt / 2 * pp += assemble(dt * inner(grad(v), grad(phi))*dx) \

/ assemble(v*dx)bc.apply(p)phi -= dt / 2 * pt += dt

1F. Rathgeber, D. Ham, L. Mitchell, M. Lange, F. Luporini, A. McRae, G. Bercea, G. Markall, and P. Kelly. Firedrake: Automating thefinite element method by composing abstractions. Submitted to ACM TOMS, 2015

2A. Logg, K.-A. Mardal, and G. Wells. Automated Solution of Differential Equations by the Finite Element Method. Springer, 2012

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 15: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake

Firedrake - Automated FiniteElement computation

I Implements UFL1

I Outer framework in PythonI Run-time C code generationI PyOP2: Assembly kernel

execution framework

I Domain topology from DMPlexI Mesh generation and file I/OI Derive discretisation-specific

mappings at run-timeI Geometric Multigrid

Unified FormLanguage

PyOP2Interface

modifiedFFC

Parallel scheduling, code generation

CPU(OpenMP/OpenCL)

GPU(PyCUDA /PyOpenCL)

Futurearch.

FEM problem(weak form PDE)

Local assemblykernels (AST)

Parallel loops: kernelsexecuted over mesh

Explicitlyparallelhardware-specificimplemen-tation

Meshes,matrices,vectors

PETSc4py (KSP,SNES, DMPlex)

Firedrake/FEniCSlanguage

MPI

Geometry,(non)linearsolves

assembly,compiledexpressions

FIAT

parallelloop

parallelloop

COFFEEAST optimiser

data structures(Set, Map, Dat)

Domain specialist: mathematicalmodel usingFEM

Numerical analyst: generation ofFEM kernels

Domain specialist: mathematicalmodel on un-structured grid

Parallelprogrammingexpert: hardwarearchitectures, optimisation

Expert for each layer

1M. Alnæs, A. Logg, K. Ølgaard, M. Rognes, and G. Wells. Unified Form Language: A domain-specific language for weak formulationsof partial differential equations. ACM Transactions on Mathematical Software (TOMS), 40(2):9, 2014

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 16: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake

Firedrake - Data structures

I DMPlex encodes topologyI Parallel distributionI Application ordering

I Section encodes discretisationI Maps DAG to solution DoFsI Generated via FIAT element1

I Derives PyOP2 indirectionmaps for assembly

I SF performs halo exchangeI DMPlex derives SF from

section and overlap

1

Mesh

Topology

DMPlex

- File I/O- Partitioning- Distribution- Topology- Renumbering

IS

Perm

utatio

n

Discretisation

FunctionSpace

FIAT.Element

PetscSection

DAG→DoF

PyOP2.Map

Cell→DoF

Halo

PetscSF

Data

Function

Vec

Local

Rem

ote

1R. Kirby. FIAT, A new paradigm for computing finite element basis functions. ACM Transactions on Mathematical Software (TOMS),30(4):502–516, 2004

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 17: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake

PyOP2 - Kernel execution

I Run-time code generationI Intermediate representationI Kernel optimisation via AST1

I Overlapping communicationI Core: Execute immediatelyI Non-core: Halo-dependentI Halo: Communicate while

computing over core

I Imposes ordering constraint

Partition 0 Partition 1

1F. Luporini, A. Varbanescu, F. Rathgeber, G.-T. Bercea, J. Ramanujam, D. Ham, and P. Kelly. Cross-Loop Optimization of ArithmeticIntensity for Finite Element Local Assembly. Accepted for publication, ACM Transactions on Architecture and Code Optimization, 2015

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 18: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake

Firedrake - RCM reordering

I Mesh renumberingI Improves cache coherencyI Reverse Cuhill-McKee (RCM)

I Combine RCM with PyOP2ordering1

I Filter cell reorderingI Apply within PyOP2 classesI Add DoFs per cell (closure)

Native RCM

Seq

uen

tia

lP

ara

llel

1M. Lange, L. Mitchell, M. Knepley, , and G. Gorman. Efficient mesh management in Firedrake using PETSc-DMPlex. Submitted toSISC Special Issue, 2015

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 19: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake PerformanceIndirection cost of assembly loops:

I Cell integral: L = u * dxI Facet integral: L = u(’+’)* dS

100 loops on P1

1 2 6 12 24 48 96Number of processors

10-2

10-1

100

tim

e [

sec]

Cell integral, RCM

Facet integral, RCM

Cell integral, Native

Facet integral, Native

100 loops on P3

1 2 6 12 24 48 96Number of processors

10-2

10-1

100

tim

e [

sec]

Cell integral, RCM

Facet integral, RCM

Cell integral, Native

Facet integral, Native

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 20: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake PerformanceAdvection-diffusion:

I Eq: ∂c∂t

+5· (~uc) = 5· (κ5 c)I L-shaped mesh with ≈ 3.1 mio. cells

Matrix assembly on P1

1 2 6 12 24 48 96Number of processors

10-2

10-1

100

101

tim

e [

sec]

Advection, RCM

Diffusion, RCM

Advection, Native

Diffusion, Native

Matrix assembly on P3

1 2 6 12 24 48 96Number of processors

10-1

100

101

102

tim

e [

sec]

Advection, RCM

Diffusion, RCM

Advection, Native

Diffusion, Native

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 21: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake PerformanceAdvection-diffusion:

I Eq: ∂c∂t

+5· (~uc) = 5· (κ5 c)I L-shaped mesh with ≈ 3.1 mio. cells

RHS assembly on P1

1 2 6 12 24 48 96Number of processors

10-2

10-1

100

101

tim

e [

sec]

Advection, RCM

Diffusion, RCM

Advection, Native

Diffusion, Native

RHS assembly on P3

1 2 6 12 24 48 96Number of processors

10-1

100

101

102

tim

e [

sec]

Advection, RCM

Diffusion, RCM

Advection, Native

Diffusion, Native

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 22: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Firedrake PerformanceAdvection-diffusion:

I Advection solver: CG + JacobiI Diffusion solver: CG + HYPRE BoomerAMG

Solve on P1

1 2 6 12 24 48 96Number of processors

10-2

10-1

100

101

102

103

tim

e [

sec]

Advection, RCM

Diffusion, RCM

Advection, Native

Diffusion, Native

Solve on P3

1 2 6 12 24 48 96Number of processors

100

101

102

103

104

tim

e [

sec]

Advection, RCM

Diffusion, RCM

Advection, Native

Diffusion, Native

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 23: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Motivation

Unstructured Mesh Management

Parallel Mesh Distribution

Firedrake

Summary

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 24: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Summary

DMPlex mesh management

I Unified mesh reader/writer interface

I Improved DMPlexDistributeI Scalable overlap generationI All-to-all load balancing

I Firedrake FE environmentI DMPlex as topology abstractionI Derive indirection maps for assemblyI Compact RCM renumbering

Future work

I Parallel mesh file readsI Anisotropic mesh adaptvity

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management

Page 25: Flexible, Scalable Mesh and Data Management using PETSc DMPlex€¦ · I Mesh management optimisations I Scalable read/write routines I Parallel partitioning and load-balancing I

Thank You

www.firedrakeproject.org www.fenicsproject.org

M. Lange, M. Knepley, L. Mitchell, G. GormanDMPlex Mesh Management