data structures and methods for unstructured distributed ...roystgnr/parmesh_talk.pdf · 1...

32
PECOS Predictive Engineering and Computational Sciences Data Structures and Methods for Unstructured Distributed Meshes Roy H. Stogner The University of Texas at Austin May 23, 2012 Roy H. Stogner Distributed Meshes May 23, 2012 1 / 19

Upload: others

Post on 15-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

PECOSPredictive Engineering and Computational Sciences

Data Structures and Methods for UnstructuredDistributed Meshes

Roy H. Stogner

The University of Texas at Austin

May 23, 2012

Roy H. Stogner Distributed Meshes May 23, 2012 1 / 19

Page 2: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

1 Introduction2 Data Structures

Unstructured MeshingDistributed Meshing

3 AlgorithmsParallel UtilitiesParallel Mesh Manipulation

4 ApplicationsPerformanceChallenges

5 Conclusions

Roy H. Stogner Distributed Meshes May 23, 2012 2 / 19

Page 3: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Introduction

Context and Motivations

PetscMetis

Mpich

Laspack

LibMesh

STL

RBM

TumorAngiogenesis

CompressibleNS

DD

Elem

NodeElem FaceEdge Cell InfQuad InfCell

Prism Hex Pyramid Tet

Hex8 Hex20 Hex27

DofObject

Node

libMesh design• C++ library classes

I Hybrid parallel FEM calculationsI Unstructured adapted meshesI Abstract base classes like Elem,

FEBaseI Concrete leaf classes like Hex27,

FEHierarchic

ParallelMesh design goals• Eliminate serialized mesh bottleneck

• O (NE/NP ) memory use

• Interchangeable with Mesh (nowSerialMesh)

• Incremental design and testing

Roy H. Stogner Distributed Meshes May 23, 2012 3 / 19

Page 4: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Data Structures Unstructured Meshing

Unstructured Mesh Data: Elements, Nodes

• Heap memory allocation formixed geometric, finite elementtypes

• Mesh provides iterators, randomaccess

• Each element provides pointersto neighbors, nodes

• Adaptive refinement addsparents, children

• Boundary adds null neighbors

Roy H. Stogner Distributed Meshes May 23, 2012 4 / 19

Page 5: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Data Structures Unstructured Meshing

Unstructured Mesh Data: Degrees of Freedom

Indexing Requirements• Multiple systems of equations on each Mesh

• Multiple variables in each system

• (Varying) multiple components to each variable

• Extra components on some hanging node types

• Second-order geometric nodes give topologicalstorage for higher-order finite elements

Roy H. Stogner Distributed Meshes May 23, 2012 5 / 19

Page 6: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Data Structures Distributed Meshing

Mesh Class Hierarchy

MeshBase

+boundary_info: AutoPtr<BoundaryInfo>

+elem(unsigned int)

+node(unsigned int)

+elements_begin/end()

+nodes_begin/end()

+active_local_elements_begin/end()

+read()

+write()

ParallelMesh

#_elements: mapvector<Elem*>

+allgather()

+delete_remote_elements()

UnstructuredMesh

+find_neighbors()

+read/write()

+add/delete_elem()

SerialMesh

#_elements: vector<Elem*>

+elements_begin()

CartesianMesh

Object-Oriented Design• Predicated iterators for

traversing subsets of elements,nodes

• Abstract iterator interface hidesmesh type from mostapplications

• UnstructuredMesh ”branch” formost library code

• ParallelMesh leaf implementsdistributed storage,synchronization

Roy H. Stogner Distributed Meshes May 23, 2012 6 / 19

Page 7: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Data Structures Distributed Meshing

Distributed Mesh Data Structures

SerialMesh• Contiguous vector of pointers to Elem, Node

• elements[i]->id() == i

• O (1) sequential traversal

• O (1) random access

ParallelMesh• map (binary tree) or unsorted map (hash table)

• elements[i]->id() == i

• O (1) (sequential or unsorted) traversal

• O (logNE/NP ) or O (1) random access

Roy H. Stogner Distributed Meshes May 23, 2012 7 / 19

Page 8: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Data Structures Distributed Meshing

Distributed Mesh Data Structures - RemoteElem

• Boundary neighbors still point toNULL

• Neighbors owned by remoteranks are “ghost” elements withfull data

• Neighbors which exist only onremote ranks point toremote elem

• O (1) memory, tests

• Code “walking” past ghostelements triggers runtime errors

Roy H. Stogner Distributed Meshes May 23, 2012 8 / 19

Page 9: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Utilities

Utility Functions - Parallel::

MPI interfacestd::vector<Real> send, recv;

...

if (dest processor id == libMesh::processor id() &&

source processor id == libMesh::processor id())

recv = send;

#ifdef HAVE MPI

else

unsigned int sendsize = send.size(), recvsize;

MPI Status status;

MPI Sendrecv(&sendsize, 1, datatype<unsigned int>(),

dest processor id, 0,

&recvsize, 1, datatype<unsigned int>(),

source processor id, 0,

libMesh::COMM WORLD,

&status);

recv.resize(recvsize);

MPI Sendrecv(sendsize ? &send[0] : NULL, sendsize, MPI DOUBLE,

dest processor id, 0,

recvsize ? &recv[0] : NULL, recvsize, MPI DOUBLE,

source processor id, 0,

libMesh::COMM WORLD,

&status);

#endif // HAVE MPI

Roy H. Stogner Distributed Meshes May 23, 2012 9 / 19

Page 10: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Utilities

Utility Functions - Parallel::

Parallel:: interfacestd::vector<Real> send, recv;

...

Parallel::send receive(dest processor id, send,

source processor id, recv);

Advantages• Recompilable without MPI• MPI implementation optimizable “under the hood”

I MPI probe for vector lengthsI Asynchronous I/O

• Data type independent

• Container type independent

Roy H. Stogner Distributed Meshes May 23, 2012 10 / 19

Page 11: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Utilities

Utility Functions - Parallel::

Parallel:: algorithmstemplate <typename Iterator,

typename SyncFunctor> void

sync dofobject data by id

(const Iterator& range begin,

const Iterator& range end,

SyncFunctor& sync);

Roy H. Stogner Distributed Meshes May 23, 2012 11 / 19

Page 12: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Utilities

Utility Functions - Parallel::

Advantages• Generic communication for

Elements, Nodes

• SyncFunctor::act on dataupdates local mesh subset

• Communications algorithm isindependent of functor

Roy H. Stogner Distributed Meshes May 23, 2012 11 / 19

Page 13: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Utilities

Utility Functions - Parallel::

Round-Robin• Processor P sends to processorP +K while receiving fromP −K

• Old data discarded after eachstep - O (NG/NP ) in transit

• O (NG) worst-case executiontime

Roy H. Stogner Distributed Meshes May 23, 2012 11 / 19

Page 14: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Utilities

Utility Functions - Parallel::

Future optimizations• Precalculate, skip

non-neighboring processors

• Queue multiple asynchronoussends at a time

Roy H. Stogner Distributed Meshes May 23, 2012 11 / 19

Page 15: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Mesh Distribution

Mesh Creation• File 1-to-N input, serial generation

I Remote elements must be removedI One pass: flag nodes on local elementsI Second pass: delete disconnected

elements, replace with remote elem

• File N-to-N input

I Ghost elements, nodes must be gatheredfrom neighboring processors

I Local boundary node id lists sent toadjacent processors

I Ghost nodes, elements received in return

• Parallel generation

I Ids i : i mod (NP + 1) = p are ownedby processor p

I Very algorithm-dependent

Roy H. Stogner Distributed Meshes May 23, 2012 12 / 19

Page 16: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Mesh Distribution

Mesh Creation• File 1-to-N input, serial generation

I Remote elements must be removedI One pass: flag nodes on local elementsI Second pass: delete disconnected

elements, replace with remote elem

• File N-to-N input

I Ghost elements, nodes must be gatheredfrom neighboring processors

I Local boundary node id lists sent toadjacent processors

I Ghost nodes, elements received in return

• Parallel generation

I Ids i : i mod (NP + 1) = p are ownedby processor p

I Very algorithm-dependent

Roy H. Stogner Distributed Meshes May 23, 2012 12 / 19

Page 17: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Mesh Distribution

Mesh Creation• File 1-to-N input, serial generation

I Remote elements must be removedI One pass: flag nodes on local elementsI Second pass: delete disconnected

elements, replace with remote elem

• File N-to-N inputI Ghost elements, nodes must be gathered

from neighboring processorsI Local boundary node id lists sent to

adjacent processorsI Ghost nodes, elements received in return

• Parallel generation

I Ids i : i mod (NP + 1) = p are ownedby processor p

I Very algorithm-dependent

Roy H. Stogner Distributed Meshes May 23, 2012 12 / 19

Page 18: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Mesh Distribution

Mesh Creation• File 1-to-N input, serial generation

I Remote elements must be removedI One pass: flag nodes on local elementsI Second pass: delete disconnected

elements, replace with remote elem

• File N-to-N inputI Ghost elements, nodes must be gathered

from neighboring processorsI Local boundary node id lists sent to

adjacent processorsI Ghost nodes, elements received in return

• Parallel generationI Ids i : i mod (NP + 1) = p are owned

by processor pI Very algorithm-dependent

Roy H. Stogner Distributed Meshes May 23, 2012 12 / 19

Page 19: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Degree of Freedom Numbering

One-pass indexing

• Index processor P starting from∑P−1

1 NEp

• Pass∑P

1 NEp to processor P + 1

• O(NE)

Two-pass indexing• Count processor P indices starting from 0

• Gather NEp on all processors

• Re-index processor P from∑P−1

1 NEp

• Trade ghost indices with remote processors

• O(NE/NP ) execution time

Roy H. Stogner Distributed Meshes May 23, 2012 13 / 19

Page 20: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Degree of Freedom Numbering

One-pass indexing

• Index processor P starting from∑P−1

1 NEp

• Pass∑P

1 NEp to processor P + 1

• O(NE)

Two-pass indexing• Count processor P indices starting from 0

• Gather NEp on all processors

• Re-index processor P from∑P−1

1 NEp

• Trade ghost indices with remote processors

• O(NE/NP ) execution time

Roy H. Stogner Distributed Meshes May 23, 2012 13 / 19

Page 21: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Mesh RefinementError Estimation

• Local residual, jump error estimators

• Refinement-based estimators

• Adjoint-based estimators

• Recovery estimators

Refinement Flagging

• Flagging by error tolerance: η2K < η2

tol/NE

I Embarrassingly parallel

• Flagging by error fraction: ηK < rmaxK ηK

I One global Parallel::max to find maximum error

• Flagging by element fraction or target NE

I Parallel::Sort to find percentile levels?I Binary search in parallel?I TBD

Roy H. Stogner Distributed Meshes May 23, 2012 14 / 19

Page 22: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Mesh RefinementError Estimation

• Local residual, jump error estimators: embarrassingly parallel

• Refinement-based estimators: use solver parallelism

• Adjoint-based estimators: use solver parallelism

• Recovery estimators: require partitioning-aware patch generation

Refinement Flagging

• Flagging by error tolerance: η2K < η2

tol/NE

I Embarrassingly parallel

• Flagging by error fraction: ηK < rmaxK ηK

I One global Parallel::max to find maximum error

• Flagging by element fraction or target NE

I Parallel::Sort to find percentile levels?I Binary search in parallel?I TBD

Roy H. Stogner Distributed Meshes May 23, 2012 14 / 19

Page 23: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Mesh RefinementError Estimation

• Local residual, jump error estimators: embarrassingly parallel

• Refinement-based estimators: use solver parallelism

• Adjoint-based estimators: use solver parallelism

• Recovery estimators: require partitioning-aware patch generation

Refinement Flagging

• Flagging by error tolerance: η2K < η2

tol/NE

I Embarrassingly parallel

• Flagging by error fraction: ηK < rmaxK ηK

I One global Parallel::max to find maximum error

• Flagging by element fraction or target NE

I Parallel::Sort to find percentile levels?I Binary search in parallel?I TBD

Roy H. Stogner Distributed Meshes May 23, 2012 14 / 19

Page 24: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Mesh RefinementError Estimation

• Local residual, jump error estimators: embarrassingly parallel

• Refinement-based estimators: use solver parallelism

• Adjoint-based estimators: use solver parallelism

• Recovery estimators: require partitioning-aware patch generation

Refinement Flagging

• Flagging by error tolerance: η2K < η2

tol/NE

I Embarrassingly parallel• Flagging by error fraction: ηK < rmaxK ηK

I One global Parallel::max to find maximum error• Flagging by element fraction or target NE

I Parallel::Sort to find percentile levels?I Binary search in parallel?I TBD

Roy H. Stogner Distributed Meshes May 23, 2012 14 / 19

Page 25: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Mesh Refinement

Elem, Node creation

• Ids i : i mod (NP + 1) = p areowned by processor p

Synchronization• Refinement Flags

I Data requested by idI Iteratively to enforce smoothing

• New ghost child elements, nodes

I Id requested by data

• Hanging node constraint equations

I Iteratively through subconstraints,subconstraints-of-subconstraints...

Roy H. Stogner Distributed Meshes May 23, 2012 15 / 19

Page 26: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Mesh Refinement

Elem, Node creation• Ids i : i mod (NP + 1) = p are

owned by processor p

Synchronization• Refinement Flags

I Data requested by idI Iteratively to enforce smoothing

• New ghost child elements, nodes

I Id requested by data

• Hanging node constraint equations

I Iteratively through subconstraints,subconstraints-of-subconstraints...

Roy H. Stogner Distributed Meshes May 23, 2012 15 / 19

Page 27: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Mesh Refinement

Elem, Node creation• Ids i : i mod (NP + 1) = p are

owned by processor p

Synchronization• Refinement Flags

I Data requested by idI Iteratively to enforce smoothing

• New ghost child elements, nodes

I Id requested by data

• Hanging node constraint equations

I Iteratively through subconstraints,subconstraints-of-subconstraints...

Roy H. Stogner Distributed Meshes May 23, 2012 15 / 19

Page 28: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Algorithms Parallel Mesh Manipulation

Distributed Mesh Refinement

Elem, Node creation• Ids i : i mod (NP + 1) = p are

owned by processor p

Synchronization• Refinement Flags

I Data requested by idI Iteratively to enforce smoothing

• New ghost child elements, nodesI Id requested by data

• Hanging node constraint equationsI Iteratively through subconstraints,

subconstraints-of-subconstraints...

Roy H. Stogner Distributed Meshes May 23, 2012 15 / 19

Page 29: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Applications Performance

“Typical” PDE exampleTransient Cahn-Hilliard, Bogner-Fox-Schmidt quads or hexes

101

102

103

104

103

104

105

106

107

RA

M (

MB

)

Mesh Size (quad elements)

2D SerialMesh vs. ParallelMesh Per-Node Memory Usage

1 Node Serial 2D

1 Node Parallel 2D

2 Node Serial 2D

2 Node Parallel 2D

4 Node Serial 2D

4 Node Parallel 2D

8 Node Serial 2D

8 Node Parallel 2D

16 Node Serial 2D

16 Node Parallel 2D

32 Node Serial 2D

32 Node Parallel 2D

101

102

103

104

103

104

105

106

RA

M (

MB

)

Mesh Size (hex elements)

3D SerialMesh vs. ParallelMesh Per-Node Memory Usage

1 Node Serial 3D

1 Node Parallel 3D

2 Node Serial 3D

2 Node Parallel 3D

4 Node Serial 3D

4 Node Parallel 3D

8 Node Serial 3D

8 Node Parallel 3D

16 Node Serial 3D

16 Node Parallel 3D

32 Node Serial 3D

32 Node Parallel 3D

Results• Parallel codes using SerialMesh are unchanged for ParallelMesh

• Overhead, distributed sparse matrix costs are unchanged

• Serialized mesh, indexing once dominated RAM use

Roy H. Stogner Distributed Meshes May 23, 2012 16 / 19

Page 30: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Applications Challenges

Distributed Correlation Length PostprocessingCahn-Hilliard FEM has only semilocal dependencies

(∂c

∂t, φ)Ω =−

(Mc∇f ′0(c),∇φ

)Ω− ε2c

(∆c,∇ ·MT

c ∇φ)

Ω

+((Mc∇

(f ′0(c)− ε2c∆c

))· ~n, φ

)∂Ω

+ ε2c(∆c,MT

c ∇φ · ~n)∂Ω

Correlation lengths involve global integrals

< f(~x) > ≡∫

Ω f(~x) dΩ∫Ω 1 dΩ

r(~y) ≡< c(~x)c(~x+ ~y) > − < c(~x) >2

• Communication of remote data unavoidable

• Communication of remote geometry avoidedvia coarse mesh integration

Roy H. Stogner Distributed Meshes May 23, 2012 17 / 19

Page 31: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Applications Challenges

Hypersonic Flow Calculations

Flow, transport, shock capturing, reacting chemistry, thermalnon-equilibrium, turbulence: all passing regression tests onParallelMesh at 10−9 tolerancesSerialMesh-dependent exceptions:Shock fitting, transition integrals

Roy H. Stogner Distributed Meshes May 23, 2012 18 / 19

Page 32: Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

Conclusions

Ongoing Work• DofObject serialization, redistribution

• Adaptive coarsening

• Benchmarking, optimization

• Extend Parallel:: APIs from vector,set→ map

• APIs for integro-differential terms

Any Questions?

Roy H. Stogner Distributed Meshes May 23, 2012 19 / 19