pa r allel adaptiv e mesh r eﬁnem ent for incompressi ble ﬂow probl ems

odici

es ain ar the pribump, whpl

2012 Elsevier Ltd. All rights reserved.

h in ornumeeasedutes thstrucAMR)during

A number of different algorithms can be found in the literaturedescribing multiple approaches to the splitting of simplicial ele-ments. Most techniques are based on some kind of edge-subdivi-sion, which has proved to be very effective and relatively easy tocode in a serial context. Despite the existence of various successful

d, producinSince theforming m

either special numerical algorithms are designed to handhanging nodes, or it is combined with other types of renemproduce a conforming mesh. On the other hand, with the bisectionrenement, only one edge of the triangle or the tetrahedron, calledthe renement edge, is bisected, producing two smaller triangles ortetrahedra. The main advantage of bisection renement is that itautomatically produces locally rened conforming meshes andnested nite element spaces. A major drawback of the simple edgebisection algorithm is the poor mesh quality, most notably afterseveral renements. The main problem with bisecting renement

Corresponding author at: Centre Internacional de Mtodes Numrics enEnginyeria (CIMNE), Gran Capit s/n, Edici C1, 08034 Barcelona, Spain.

E-mail addresses: [email protected] (R. Rossi), [email protected]

Computers & Fluids 80 (2013) 342355

Contents lists available at

Computers

lse(J. Cotela).new nodes along the edges of the original elements, as in the caseof the algorithm presented here, interpolating data to the newmesh is a trivial operation.

edges of the triangle or tetrahedron to be renesmaller triangles or eight smaller tetrahedra.renement cannot generate locally rened con0045-7930/$ - see front matter 2012 Elsevier Ltd. All rights reserved.doi:10.1016/j.compuid.2012.01.023g fourregulareshes,le theent towas soon detected as a very attractive possibility and was usedextensively in different areas of engineering, but particularly inthe eld of compressible CFD. Mesh renement is typically lesscomputationally expensive than remeshing, as it uses the existingconnectivity as a starting point, which simplies the mapping ofthe data associated to the original nodes and elements to thenew ones. In particular, if the renement is performed by adding

of magnetic elds and electromagnetic problems [17,19,26,38], andgenerate a nearly optimal mesh in which the discretization error isequally distributed with the help of an error estimation [8,15].

Some commonly used renement approaches are the regularrenement, the bisection [3,13,17,19,28,30,44], and renementusing the Delaunay criterion and edge splitting.

The regular renement consists in simultaneously bisecting all1. Introduction

The importance of rening a mesingly ne details is well known in thethe possibility of providing an incrselected areas of the domain constitnon-structured discretizations overuse of Adaptive Mesh Renement (improve adaptively the resolutionder to capture increas-rical community, whilelevel of renement ine key to the success oftured alternatives. Thetechniques in order tothe solution process,

implementations, it is still generally difcult to port the originalserial algorithms to a distributed environment.

An early investigation of local mesh renement can be found inthe works of Babuscka, Rheinboldt et al. in the FEARS project [3].Since then, local and global mesh renement and coarsening oftriangular and tetrahedralmeshes have been used for nite elementmethods for solving linear elliptic partial differential equations [5],optimization and visualization of volume data [16], magnetostaticproblems [33], compressible ows [34], nite element computationParallel computingIncompressible NavierStokesSubgrid error estimation

incompressible uid-ow benchmarks using a novel indicator based on the computation of the sub-scalevelocity.Parallel adaptive mesh renement for inc

R. Rossi a,b, J. Cotela a,b,, N.M. Lafontaine b, P. DadvanaCentre Internacional de Mtodes Numrics en Enginyeria (CIMNE), Gran Capit s/n, EdbUPC BarcelonaTech, Campus Nord UPC, 08034 Barcelona, Spainc Instituci Catalana de Recerca i Estudis Avanats (ICREA), Barcelona, Spain

a r t i c l e i n f o

Article history:Received 15 September 2011Received in revised form 26 January 2012Accepted 27 January 2012Available online 10 February 2012

Keywords:Adaptive mesh renement

a b s t r a c t

The present article describcial nite element meshesadaptive renement and fosufcient detail to allow theffort provided that a distrcomposed of three basic coerror estimation techniquenumber of benchmark exam

journal homepage: www.empressible ow problemsa,b, S.R. Idelsohn a,c

C1, 08034 Barcelona, Spain

simple element-driven strategy for the conforming renement of simpli-distributed environment. The proposed algorithm is effective both for locale division of all the elements within an existing mesh. We aim to provideactical implementation of the algorithm, which can be coded with minimalted linear algebra library is available. The proposed renement strategy isonents: a global splitting strategy, an elemental splitting procedure and anich are combined so to guarantee obtaining a conformant rened mesh. Aes show the capabilities of the proposed method. Error is estimated for the

SciVerse ScienceDirect

& Fluids

vier .com/locate /compfluid

Other mesh renement strategies in the literature use a datastructure for adaptive Finite Element computation of 2D and 3D

& Fis how to select the renement edge such that triangles or tetrahe-dra produced by successive renements do not degenerate.

The methods for selecting the renement edge proposed byvarious authors can be classied into two categories, namely thelongest edge approach [21] and the newest vertex approach (thelatter is also called the newest node approach by some authors).

A class of algorithms based on the renement of the longestedge is proposed and studied by Rivara et al. [41]. In these algo-rithms, triangles or tetrahedra are always bisected using one oftheir longest edges, and the nite termination of the renementprocedure is obvious because when traversing from one simplexto another in order to make the resulting mesh conforming, onesteps along paths of simplices with longer longest edges. In [42],some mathematical guarantees are provided on the quality of theresulting mesh.

The newest vertex approach was rst proposed for two dimen-sional triangular meshes by Sewell [43], and was generalized tothree dimensions by Bansch [6]. More recent work on the newestvertex algorithms is described in the papers of Kossaczky [29]and Liu and Joe [32]. The concept of the newest vertex approachis very simple in two dimensions: once the renement edge forthe triangles in the initial mesh is determined, the renement edgeof a newly generated triangle is the edge opposite to its newestvertex. Unfortunately, its generalization to three dimensions ishighly non-trivial, and the algorithms proposed by various authorsare essentially equivalent, but use different interpretations. It istheoretically proved that tetrahedra generated by these algorithmsbelong to a nite number of similarity cases, which ensures non-degeneracy of tetrahedra produced by repeated bisections.

There are alsoworks in the literatureonparallelmesh renementalgorithm for triangular and tetrahedra meshes [4,13,21,28,31] andsome of them use bisection schemes [44]. Rivara et al. proposed aparallel algorithm for global renement of tetrahedral mesheswhich is not suitable for adaptive local renement. Pebay andThompson [39] presented a parallel renement algorithm for tetra-hedral meshes based on edge splitting. Jones and Plassmann [27]proposed and studied a parallel algorithm for adaptive local rene-ment of two dimensional triangularmeshes. Barry et al. [7] also pre-sented a framework for implementing parallel adaptive niteelement applications and demonstrated it with 2D and 3D plasticityproblems. Zang [44] presents a parallel algorithm for distributedmemory parallel computers using bisection, which is characterizedby allowing simultaneous renement of submeshes to arbitrarylevels before synchronization between submeshes and withoutthe need of a central coordinator process formanaging newvertices.His algorithm is based on the standard message passing interfaceMPI. The mesh is partitioned in submeshes, as many as numberthe of MPI processes. Partitioning is computed using METIS. Afterpartitioning themesh, in the rst step of his algorithm, the submes-hes are rened independently, with the shared faces treated as ifthey were boundary faces. Then, tetrahedra containing one or moreshared faces which have been bisected during the rst phase areexchanged between neighbor submeshes, and tetrahedra havingone or more hanging shared faces are bisected. The process stopswhen global conformity of the mesh is reached, as the rst step cre-ates submeshes with non-conforming shared faces. De Cougny andShephard [13] use edge-based subdivision templates for renement,that is, they uses pre-dened templates to renemesh regions. Likethe othermethod, a two step process is necessary for parallel rene-ment: the rst phase consists in subdividing mesh faces on parti-tions and then, on a second phase, the meshes are subdividedusing the templates as in serial. However, one has tomake sure thatduplicate faces on partition boundaries are meshed identically. An

R. Rossi et al. / Computersimproved simple cell-quality control for a large-scale unstructuredtetrahedral mesh for parallel AMR was proposed by Lian et al. [31].It was designed such that the resulting rened mesh informationproblems. Using a hierarchical minimal tree based data structurefor mesh renement is proposed in [9]. This algorithm is imple-mented by imposing a one-level rule and using the adjacent neigh-bor (sharing edges and faces) concept for recursive renement. Thistechnique generates mesh renement data such as a connectivitymatrix, an automatic local and global node numbering, a naturalorder of element sequence, and a coordinate array for the renedelements. A local mesh renement using Local Delaunay Subdivi-sions is presented by T.W. Nehl and D.A. Field for solving magneto-static problems [9].

Besides the need of organizing operations so to minimize ran-dom communications, the implementation of such techniquesgenerally requires optimized custom data structures which arenormally not easily available to code developers in other elds.

The present document describes a new strategy to perform localmesh renement, based on the division of chosen elements bysplitting their edges. The rationale at the base of our approach isto provide an algorithm that can be implemented with minimal ef-fort, leveraging existing linear-algebra data structures. As we shalldescribe in the following, our technique assumes the availability ofa distributed sparse-matrix package, preferably prepared for FiniteElement assembly. Our testing was done using the Epetra packageof the Trilinos Framework [20], but similar capability can be foundin any other equivalent library. The only other dependency is aroutine that provides the splitting pattern for all of the cases ofinterest. Our aim is to provide, together with this work, a liberallylicensed version of such routine which considers all of the 729 (36)cases which may appear during the subdivision of tetrahedra.

As amatter of fact a numberof commonsteps are needed to allowthe use of adaptive renements within a distributed code, namely

Implementation of a renement strategy Denition of renement indicators Preparation of the code to allow variations in the connectivity Load Balancing

While the last three of such points depend on the specic prob-lem of interest as well as on the implementation of the software,the renement strategy is modular with respect to the others.The objective of the present article is exactly to focus on such as-pect, describing an algorithm that is amenable to a modular imple-mentation within any unstructured distributed code (on simplicialmeshes). In accordance with this goal, emphasis will be placed onthe more practical aspects of the algorithm and its implementationon a distributed environment, and mesh quality will not be consid-ered in this document.

In addition, an error estimator for incompressible ow prob-lems will be presented in the nal part of this document. This esti-mator will allow the denition of a renement criterion which willbe used in the application examples. Obviously, other techniqueswill be required to determine where renement should be per-formed in other elds but, as mentioned, this last component isindependent of the renement algorithm.

2. Global splitting algorithm

In current section, we aim to describe our proposal for the sub-canbe readily utilized in bothnodeor cell-basednumericalmethodsParallel implementation of its mesh renement is widely discussedin the same reference.

luids 80 (2013) 342355 343division algorithm. The goal is to obtain a conformant mesh bysplitting an arbitrarily dened subset of the elements in the mesh(which may well contain the totality of the elements).

& FOur approach is based in the division of the elements identiedas candidates for renement by introducing new nodes on themidpoints of their edges, dividing each triangle in four smallerones or each tetrahedron in eight parts. Adjacent elements are splitaccordingly to preserve a conforming connectivity.

The approach is designed under the rather standard assumptionthat the domain splitting assigns each tetrahedron univocally toone of the MPI domains. Here, nodes are assigned to a singleprocess, which is said to own them although, occasionally, someof these nodes will appear in elements owned by a differentprocess. Nodes which appear on a process local elements but arenot owned by it are referred throughout this document as ghostnodes. It is assumed that all processes will store a copy of all theirghost nodes and the database of nodal values associated to them.

Each node is expected to have a uniquely dened global Id (GID)and to store the rank of the owner process, and each edge is univ-ocally identied by the GIDs of its end nodes.

Since the algorithm is rather complex and articulated in six steps,we believe that the only viable presentation is by providing acommented pseudo-code. In order to abstract from a specic imple-mentation of the linear algebra library, we will assume that somecapabilities are provided by the linear algebra library, namely:

ConstructLocal_FEGraphFunction that denes the local connectivity, that is, the con-nectivity of all of the nodes that are needed within a singleMPI process and without communications (this will includeboth owned and ghost nodes for a given MPI process). Thisabstract function should take as input a given nite elementsubmesh, that is, the list of elements of the corresponding sub-domain, and construct the corresponding graph.

ConstructByLocal_GraphFunction that allows assembling the global graph of a matrixgiven the local graphs of which it is made, taking care of all ofthe MPI communications needed.

copyFunction that creates a copy of a global matrix. Note that thiscan be done without communication, locally in each domain.

SetScalarValueFunction to set to a given scalar number all ofthe non-zero entries in a matrix. Note that no communication isrequired to do this.

ADD_LocalMatrixFunction that assembles a local (sparse) matrix into the globalone, by summing up local contributions into the correspondingglobal positions, taking care of all of the MPI communicationsneeded.

REPLACE_LocalMatrixFunction that replaces the values contained in a local (sparse)matrix into the global one, taking care of all of the MPI commu-nications needed. Note that the results of this operation are notrequired to be deterministic, in the sense that they could beallowed to vary between runs.

GetLocalViewFunction that allows obtaining a local view of the terms of theglobal matrix which are described by the local graph. This func-tion is expected to handle all of the MPI communicationsneeded for importing matrix elements that are stored remotelyto a locally available sparse matrix.

Such functions map naturally to the implementation providedby the Trilinos Epetra package (the REPLACE function requires afeature rst introduced in Trilinos version 10.8.3), but they couldbe implemented with relative ease within the framework provided

344 R. Rossi et al. / Computersby other distributed packages. The key idea, is that the synchroni-zation will be hidden by the interface between local and globalmatrices, so that by obtaining a local view of a global matrix,we will have access to the synchronized values after the differentlocal contributions were added or replaced within the globalmatrix.

We should also remark that the matrix to be used is expected tobe symmetric, so that each edge with GIDs (I, J), is biunivocallyassociated to a single matrix entry. Symmetry is in particular cru-cial if a non-deterministic implementation of the REPLACE functionis used.

On the base of such denitions we can thus describe the steps ofour algorithm, which can be written as

1. First of all we need to construct the local graph, intended as asparse matrix that has a nonzero entry for any couple (I, J) ofindices that identify an edge. Such matrix should be constructedin an elementwise fashion so that any edge in the local FE meshwill be automatically part of the matrix.The local connectivity will be then used to allocate global matri-ces which will be stored in the natural format of the underly-ing linear algebra library.In this step we will also allocate the memory for all of the sparsematrices to be used in the implementation, namely A, whichwe will use to identify the edges to be rened, P which willultimately contain the rank to be assigned to each of therenement nodes, that is, the parallel process that will ownthe node, and IdMatrix which will be used towards the endof the algorithm to store the GIDs of the new nodes.

my_rank = my MPI rank

ConstructLocal_FEGraph(local_graph)

A.ConstructByLocal_Graph(local_graph)

A.SetScalarValue(1)

P = copy(A)

P.SetScalarValue(1)

IdMatrix = copy(A)

IdMatrix.SetScalarValue(0)

Alocal = GetLocalView(A,local_graph)

Plocal = GetLocalView(P,local_graph)

2. As a second step we identify the elements that are scheduled forrenement by looping over the elements and following thearbitrary suggestions of the user. Note that the description ofour renement algorithm is completely independent on theapproach used in choosing the elements to be rened. The deci-sion of which elements to rene is left to the user, allowing forarbitrary renement or, in the case of AMR, renement basedon the suggestions of an external error estimation routine.In our proposal, elements marked for renement will be split byall their edges, resulting in four triangles or eight tetrahedra foreach original element, a division that preserves the quality ofthe rened elements. Elements that share one or more edgeswith elements marked for renement will be rened accord-ingly to preserve the connectivity, although in this case thereis no guarantee on the nal mesh quality. Of course differentrenement strategies could be used at this point withoutchanging the global algorithm.

luids 80 (2013) 342355To identify the edges that will be rened, we have adopted theconvention of adding1 to the corresponding matrix entry in A.This identication of the edges to be rened will be performed

& Frst at local level and then communications will be performedto ensure a consistent behavior across processor boundaries. Alocal copy of the splitting pattern will be nally gathered forusage in the next steps.

loop on local elements

if user requests splitting

loop on edges of the element

I,J = GID of the edges of the element

Alocal(I,J) = Alocal(I,J) 1

A.ADD_LocalMatrix(Alocal)Alocal = GetLocalView(A,local_graph)

3. In the third step we assign the owner of the new nodes to becreated. Here all of the MPI processes that will need a local copyof the node in edge I,J propose to be the owner by marking thecorresponding values in Plocal with their MPI rank. The nalowner of the node is not important as long as it is uniquelydened and known to all of the processes. We thus advocatethe use of a Replace functions which guarantees that globalvalues are overwritten by local contributions so that only thelast contribution remains. This is nondeterministic, but appearsto work satisfactorily.

loop on local elements

loop on edges of the element

I,J = GIDs of the nodes at each element edge

if (Alocal(I,J) < 1)Plocal(I,J)=my_rank

P.REPLACE_LocalMatrix(Plocal)

Plocal = GetLocalView(P,local_graph)

4. In the fourth step we know exactly which edges have to be cre-ated and who will be their owner. We need to nd a suitableglobal Id, so that the global ids will be numbered consecutively,preferably avoiding gaps, and ensuring that no global id isrepeated. The technique we propose is to have each of thedifferent processes to count the new nodes of which it will bethe owner, and to use a scan-sum based approach so that thenew nodes will start from the greatest id found prior to splittingand have increasingly high id depending on their processor andposition within the local nodes.Since all of the processors involved may need access to the id ofthe new ghost nodes, we will write the newly assigned id to amatrix and synchronize it between the processors. Thepseudo-code of our proposal is thus something of the type:

local_node_counter = 0for I in rows of Alocal

for J in rows of Alocal

if (Alocal(I,J) < 1)//need remesh!if (Plocal(I,J) == my_rank)//we own the nodelocal_node_counter = local_node_counter + 1

use scan sum to determine new_id_local_start

new_id = new_id_local_startfor I in rows of Alocal

R. Rossi et al. / Computersfor J in rows of Alocalif (Alocal(I,J) < 1)//need remesh!if (Plocal(I,J) == my_rank)//we own the nodeIdMatrix_local(I,J) = new_id

new_id = new_id + 1

IdMatrix.ADD_LocalMatrix(IdMatrix_local)IdMatrix_local = GetLocalView(IdMatrix,local_graph)

5. In the fth step all the information needed to perform the cre-ation of new nodes on the edges is already available locally. Wewill thus create them with the global id we identied, performthe interpolation of coordinates and nodal data from the edgevertices, and assign the owner

// admissible node Ids start with 1 (not zero!)

for I in rows of IdMatrix_localfor J in rows of IdMatrix_localif (IdMatrix_local(I,J) > 0)generate new node in the middle of edge I,J

interpolate from I and J

assign global id = IdMatrix_local(I,J)assigin node_rank = Plocal(I,J)

6. In the sixth step we generate the new elements using the newlycreated nodes. The difculty (which we will address in nextsection) is to guarantee that the split mesh is still conformant.

In order to do so we loop over the elements and, for each ele-ment, we gather the data associated to each of the edges. If anew node is associated to any of the edges, local renement isneeded and will be performed.

This is guaranteed by the helper functions we provide which wewill discuss later on. We should remark that the creation of newnodes may be needed at this stage depending on the splittingpattern to be used. Nevertheless, eventual new nodes are purely lo-cal and therefore no new communication will be needed, besidesthe minimal one needed for assigning a unique global id.

loop on elements

if any elemental edge is splitted

gather newly created nodes

split element (see next section)

interpolate elemental data

The proposed algorithm has the advantage of being largely inde-pendent on the specic data structures used, and can be introducedwith relative ease within a wide range of computational codes. Aworking implementation of the algorithm can be freely down-loaded, together with the Kratos Multiphysics code (see [14]) at [1].

Before proceeding, we should remark that, once the algorithmnishes, the load can be severely unbalanced and action may beneeded in this sense. Since the implementation of the load balanc-ing step largely depends on the specic features of the differentcomputational codes, we do not intend to discuss here the issue.In addition to this, the proposed renumbering scheme is not opti-mal and the users may wish to perform a renumbering pass inorder to improve the ordering of nodes.

We should also observe that at the end of the renement step,the communication pattern between the different domains might

luids 80 (2013) 342355 345change. A simple example is needed to explain why this may hap-pen. Lets consider the conguration shown in Fig. 1. Assume that

2 and 3. Processor 2 on the other hand will not need to gather anynodal data since it is not owner of any node. This implies that

& Fwhen coloring the communications node 2 will not be consideredfor the gathering phase.

Lets suppose now that element 2 is marked for renement anda new node has to be inserted on edge ab. With our algorithm, theowner of the new node can be either process 1 or 2. We will as-sume for the sake of the discussion that the owner is 2. If this hap-pens, processor 2 becomes owner of a node and should be includedin the gathering phase, hence invalidating the original coloring. Asa consequence, the communication pattern has to be recomputedto take in account potential variations of the node ownerships. Ingeneral, the recalculation of an appropriate communication pat-tern between the different domains is implementation-dependentand falls aside of the objective of the current paper, nevertheless itis important to take in account that action may be needed to takecare of this situation. Finally, on open problem is the optimizationof the quality of the rened meshes. One possibility to be exploredin this sense could be the use of ideas taken from longest-edgerenement algorithms in order to improve the quality of therened meshes.

3. Splitting of simplicial elements

As observed in the introduction, the proposed implementationrelies on the availability of both a exible linear-algebra libraryand of an efcient splitting procedure. The aim of current sectionis to describe our implementation of the elemental splitting pro-cess, discussing briey how to choose the splitting mode in orderto guarantee obtaining a conformant mesh.

The basic idea is that the information available during the 6ththe original mesh, composed by triangles 1, 2 and 3 is partitionedso that these triangles are owned by processes 1, 2 and 3 respec-tively. Process 1 owns node a, while process 3 owns nodes b andc. No node is owned by processor 2.

In order to perform assembly operations, it is customary togather nodal data to the owner which will sum the differentcontributions and nally spread it to the other nodes. For the con-guration described, processor 1 needs to gather the data relatedto node a from processor 2 and the data of node b from processor

Fig. 1. An example of a mesh where recalculation of the communication patternwill be required.

346 R. Rossi et al. / Computersstep of the renement algorithm (GID of the new nodes, and theirowner) has to be sufcient to univocally dene the splitting of theelements under the additional constraint that all the faces of agiven tetrahedron should coincide with the faces of the neighbor-ing element, that is to say, the renement should be conformant.

This is best understood by a practical example: let us considerthe two tetrahedra shown on the top part of Fig. 2. The gureshows two original tetrahedra, with GIDs 1,2,3,4 and 2,5,3,4respectively which have to be split by adding the new nodes 6and 7. In the top part of the gure, a correct splitting is achievedas the edge (2,7) exists in both the tetrahedra that share the face2 3 4. In the bottom part, on the other hand, the nodes 6 and 7are correctly added to both tetrahedra, but in the rightmost tetra-hedra the edge (3,6) is included in the rened mesh while the edge(2,7) is created in the neighboring tetrahedra. As a consequence,the splitting is to be considered wrong as the face 2 3 4 is slitteddifferently in the two neighboring tetrahedra, leading to a non-conformant mesh.

Such potentially conictive case happens when two nodes areadded to two of the edges that form a face while the third edgeis not slitted.

A traditional approach would be to share some informationbetween the elements that share a face so that the splitting isperformed in the same way for the two of them. This is easily donein a scalar context but is non-trivial within a parallel process.

Our proposal attempts to avoid such communication, choosinga splitting pattern exclusively on the basis of locally available data.The idea at the heart of our approach is that such information couldbe provided by telling how a uniformly rened tetrahedra shouldbe collapsed to obtain the desired splitting pattern for all of itsfaces. This is done by indicating for the edges that are not to besplit, the direction toward which the edges of the correspondinguniformly rened element are to be collapsed.

Ifwe assume that a givenedge is identiedby theGIDs I and J, andthat LID(I) and LID(J) give us the local IDs that correspond to thevertices of the element, we will perform for each edge an operationof the type

if (edge is not splitted)

if (GID(I) < GID(J))collapse_towards = LID(I)

else

collapse_towards = LID(J)else

collapse_towards = edge_id

where the variable collapse towards (assigned for each edgeof the element) will tell us the direction towards which we shouldmove the node in the center of the edge in order to orient theappropriate collapse. An example of this situation is pictured inFig. 3. Given the face abc, assume that edges (b, c) and (c, a) aresplit. If edge (a, b) is also rened, the triangle is split in four smallerones. Otherwise, there are two possible outcomes, which can beobserved in the gure, and correspond to the situations we haveidentied as collapsing edge (a, b) towards a and towards b,respectively.

If we apply such algorithm to the example in Fig. 2 we will thusassociate to the edge (2,3) the tag 2 telling that eventual newedges in the faces that are close to the edge in question will havenew edges associated to 2 rather than to 3.

In practice, the overall splitting pattern for the tetrahedra isidentied by writing on an auxiliary array of size 6 (number ofedges of the element) a tag, that prescribes for each edge whichlocal node should be used in the splitting. A number less than 4,indicates the LID of the node towards which the new edges onthe surrounding faces should be oriented. A number exceeding 4on the other hand, will indicate the LID of a new node to be usedin the renement, corresponding to an unique edgeId. Table 3shows the different possibilities for each edge, together with thelocal numbering we use to identify the edges.

For example the rst edge, having edgeindex = 0, is identied tothe local ids (0,1), the second to (0,2) and so on. We will associateto the rst edge either the value of 0 and 1 (the LIDs of its nodes)when the edge is not to be split, or its edgeId in the case splittingis required.

luids 80 (2013) 342355As we have three possible choices per edge and a total of sixedges, the total number of possible combinations is of 36 = 729,which need to be considered as potential splitting patterns. The

& FR. Rossi et al. / Computerstotality of such combinations is considered in the splitting subrou-tine we provide.

The overall selection process will thus look in pseudo code as:

#define an array with the LIDs associated to each

edge

edge_id[0] = 4;edge_id[1] = 5;edge_id[2] = 6;edge_id[3] = 7;edge_id[4] = 8;edge_id[5] = 9;

edge_counter = 0

(a) Correct (conform

(b) Wrong (non-confoFig. 2. Conformant and non-conformant sp

Fig. 3. Collapse operations for edge (a, b), from left to right: spliluids 80 (2013) 342355 347for LID1 = 02:

for LID2 = 13:

obtain GID1, GID2 associated to the nodes with

LID1, LID2

if (edge is not splitted)

if (GID1 < GID2)edge_id[edge_counter] = LID1

else

edge_id[edge_counter] = LID2else

do nothing (already set before the loop)

edge_counter = edge_counter + 1

ant) splitting.

rmant) splitting.litting of two neighboring tetrahedra.

t edge; edge collapsed towards a; edge collapsed towards b.

By directly applying such algorithm to the two tetrahedra inFig. 2 we will thus be able to construct Tables 2 and 3. The naloutcome will be to associate to the top left tetrahedra the edge list0 0 0 1 8 9 and to the right one 0 0 6 2 4 9. This information is en-ough to univocally select a conformant splitting pattern for the twotetrahedra of interest.

On the other hand, if we consider the non-conformant splittingshown in the bottom right corner and construct its edge list, wewill immediately see that a different splitting mode is selected.Table 4 shows the indices obtained for such case, and highlightsthe point at which the choice is different from the strategy we pro-pose. The nal splitting pattern is 0 2 6 2 4 9 which, as expected,does not coincide with the correct one.

The time consuming part of the implementation is certainly thedenition of a subroutine that provides the correct splitting pat-tern once provided the edgeId list. A LGPL licensed C implementa-tion of the splitting strategy proposed can be downloaded freelyfrom the Kratos website [1,2].

In the attempt of simplifying the interface we provided 3 helper

The documentation attached to the le provides a detailedexample of usage.

4. Parallel Benchmark

To test the parallel efciency of the element splitting strategypresented in the previous pages, a simple homogeneous rene-ment example is executed. Consider a cubic domain identied byits corners (1,1,1) and (1,1,1), initially meshed using slightlyover one million tetrahedral elements. Such domain is renedhomogeneously in two passes, rst splitting all of its elements toobtain around 8 million elements and again for a total of 64 millionelements.

Table 1Denition of the edges of a tetrahedra and possible renement outcomes.

Edge index Edge Id LID1 LID2 Possibilities

0 4 0 1 0 1 41 5 0 2 0 2 52 6 0 3 0 3 63 7 1 2 1 2 74 8 1 3 1 3 85 9 2 3 2 3 9

Edge LID1 LID2 New GID1 GID2 New Follows

32 3.25 6.5 24.99 6.8

348 R. Rossi et al. / Computers & Ffunctions, namely TetrahedraSplitMode, Split Tetrahedra andTetrahedraGetNewConnectivityGID.

Table 2Top-left tetra original GIDs 1 2 3 4 identier: 0 0 0 1 8 9.

Edgeid

LID1 LID2 NewLID

GID1 GID2 NewGID

Reason of choice

4 0 1 0 1 2 1 Edge not split andGID1 < GID2




8 1 3 8 2 4 6 Insert new node 6 atedge id 8


Table 3

Top-right tetra original GIDs 2 5 3 4 identier: 0 0 6 2 4 9.

Edgeid

LID1 LID2 NewLID

GID1 GID2 NewGID

Reason of choice




7 1 2 2 5 3 3 Edge not split andGID1 > GID2

8 1 3 4 5 4 4 Edge not split andGID1 > GID2

9 2 3 9 3 4 7 Insert new node 7 atedge id 5The user is expected to dene an auxiliary vector of size 10,which contains in the rst four positions the GIDs of the nodes ofthe tetrahedra to be split. The positions between 4 and 9 corre-spond to the edges of the element (ordered as in Table 1). Theirvalue should be 1 if the edge is not to be split or the GID of thenode to be inserted if splitting is required.

The function TetrahedraSplitMode takes as input such auxil-iary vector and returns a second work vector which assigns a localid to the edges as described in the paragraph above.

The output of this function is used as input for Split Tetrahe-dra which performs the splitting and returns an array with theLIDs of the new tetras. A ag is returned to identify that a newcentral node is needed.

Finally the function TetrahedraGetNewConnectivityGID sim-plies the creation of the mesh.id LID GID convention?

4 0 1 0 2 5 2 Ok5 0 2 2 2 3 3 Not following

convention!6 0 3 6 2 4 6 Ok7 1 2 2 5 3 3 Ok8 1 3 4 5 4 4 Ok9 2 3 9 3 4 7 Ok

Table 5Time required to rene a 1 million element mesh twice.

# Threads First step Second step

Time (s) Speedup Time (s) Speedup

4 21.16 1.0 170.74 1.08 10.57 2.0 88.09 1.9

16 5.71 3.7 48.99 3.5Table 4Bottom-right tetra original GIDs 2 5 3 4 identier: 0 2 6 2 4 9.

luids 80 (2013) 342355Computations where performed using up to three blades con-taining two six-core Intel Xeon E5645 CPU (2.40 GHz, 48 GbRAM) each, connected using Inniband. The time required to per-form this operation using an increasing amount of processors isrecorded in Table 5 and presented in graphical form in Fig. 4. Theresults show that the renement algorithm exhibits a good parallelperformance, as expected.

5. A simple subscale-based error indicator

In the present section an error estimation technique for incom-pressible ow problems will be introduced. This technique is notdirectly tied to the renement strategy described in the previous

the

& Fsections, but is presented as a practical example of a criterion todrive the adaptive renement algorithm that can be used in appli-cation examples.

This renement technique is closely related to the variationalmultiscale method for the stabilization of the NavierStokes equa-tions, introduced in [22,23]. To provide a context for the error esti-mator we will proceed to briey describe the problem.

5.1. Variational multiscale formulation of the incompressible NavierStokes equations

The starting point of the formulation are the incompressibleNavierStokes equations

@tu u $u $ 2m$su rp f 1$ u 0 2where u and p represent velocity and pressure, m is the uids kine-matic viscosity, f is the vector of external forces and rsu : 12ru 12 ruT is the symmetric part of the velocity gradient.

As Eq. (1) contains a non-linear convective term, u $u, we willrst linearize it using Picards method. Using the index i to denote aresult from the previous iteration, we obtain the linearizedmomentum equation

@tu ui $u $ 2m$su rp f 3If f and g are functions such that their product fg is integrable in

the domain X, the following standard compact notation can beintroduced

Fig. 4. Wall time and speedup for

R. Rossi et al. / ComputersZXfgdX f ; gX

similarly, Xe will be used to denote integrals over a single element.By multiplying Eqs. (3) and (2) by test functions v, q and inte-

grating over the uid domain X the Galerkin weak form of theNavierStokes equations is obtained.

v;@tuXv;ui ruX2mrsv;rsuXrv;pXv;fX 4

q;r uX 0 5It is well known that the numerical solution of the Navier

Stokes equations runs into numerical instabilities due to theincompressibility constraint, as well as due to the convective termin convection-dominated ows. A series of stabilization techniqueshave been developed over the last decades to overcome thesenumerical instabilities, e.g. SUPG [25], GLS [24] or FIC [37,36,35].One of them is the variational multiscale method, which we willuse here, based on the division of the solution on large scale andsmall scale parts.u uh ~u p ph ~pThe large scale part of the solution, uh, ph, represents the com-

ponent of the exact solution that can be reproduced using a givenspatial discretization, while the small scale part, or subscale, is thedifference between the exact solution and the result of the discret-ized problem. By introducing the scale separation for the problemvariables and test functions on Eqs. (4) and (5), two different equa-tions are obtained. Omitting the details on their derivation (whichcan be found for example [12]), and neglecting some terms involv-ing integrals over element boundaries or second derivatives of thetest functions, the large scale equations read

vh; @tuhX vh;uih ruhX 2m rsvh;rsuh X r vh; phXXe

uih rvh; ~u

XeXe

r vh; ~pXe vh; fX 6

vh;r ~uX Xe

rqh; ~uXe 0 7

while the small scales are driven by

~v; @t ~uX ~v;uih r~uX 2m~v;r rs~u X r ~v; ~pX ~v; fX ~v; @tuhX ~v;uih ruhX 2m~v;r rsuhX r ~v; phX 8

~q;r ~uX ~q;r uhX 9The aim of the variational multiscale method is to solve the

large scale equations Eqs. (6) and (7) by modeling the effect the

parallel performance benchmark.

luids 80 (2013) 342355 349small scale terms ~u; ~p have on them. The modeling terms that willbe introduced are motivated by the small scale functions (Eqs. (8)and (9)), using an argument based on the small scale Greensfunction.

Observe that the small scale equations should be veried for allfunctions ~v and ~q in the spaces of velocity and pressure subscalesrespectively. As such, they can be considered as equations imposedover the L2-projection of a differential equation onto the space ofsmall scales. Using this observation, Eqs. (8) and (9) can be recastin differential form as

@t ~u uih r~ur 2mrs~u r~p ruuh; ph nh 10

r ~u rpuh dh 11where ru and rp represent the residuals of the momentum and massequations applied to the large scale variables, dened as

ruuh;ph f @tuh uih ruh r 2mrsuh rph 12

rpuh r uh 13

the Greens function associated to L is dened as a function

& FG : XX! R such that

ux ZXGx; yf ydX 8x 2 X 16

where the integral has to be understood in a distributional sense.Applying this concept to the equations that dene the small

scales, Eqs. (10) and (11), there exist Gu and Gp, called small scaleGreens function for the velocity and for the pressure respectively,such that

~u ZXGuru nhdX

~p ZXGprp dhdX

Note that the small scale Greens functions dened by theseequations are global. In the context of a spatial discretization, theGreens function for the subscales can be dened locally for eachelement, provided that the subscales are assumed to be zero onelement boundaries, which is a common assumption in stabilizedmethods. In this way, local small scale Greens functions can be de-ned, using a single element as integration domain.

In practice, the local small scale Greens function is not calcu-and nh (and dh) are such that, once added to the momentum (ormass) residuals, the sum belongs to velocity (or pressure) smallscale space.

Unfortunately, the space containing the small scale solutions isinnite-dimensional, unlike the large scale one, which is a niteelement space, and must be approximated by a nite-dimensionalspace. The choice of approximation for the small scale space deter-mines the denition for the projection terms nh, dh. One commonchoice is just assuming them to be zero, which is what was donein the original papers on the variational multiscale method andis denominated as algebraic subgrid scales (ASGS) by Codina [11],a notation that will be followed here. An alternative is consideringthe space of subscales L2 orthogonal to the space of large scales.This option was presented in [10,12], where it is called orthogonalsubscales (OSS), and results in a method very similar to projectionschemes. In this case, the projection terms nh, dh are dened as theprojection of the respective residual onto the large scales, whichensures that the right hand side in Eqs. (10) and (11) is orthogonalto the space of large scales:

nh PVh f @tuh uih ruh r 2mrsuh rph 14

dh PQh r uh 15

5.2. The small-scale Greens function

Although Eqs. (8) and (9) dene the subscale velocity and pres-sure, they are not usually solved in practice. Instead, the subscaleterms that appear in the large scale Eqs. (6) and (7) are approxi-mated using an argument based on the Greens function associatedto Eqs. (10) and (11). This procedure will be introduced here, butthe interested reader is directed to the foundational papers on var-iational multiscale methods, such as [23], for a more completepresentation.

Given a problem such as

Lu f in Xu 0 on @X

350 R. Rossi et al. / Computerslated. Instead, an algebraic approximation is dened as

~u s1 ru nh ~p s2rp dh 17where the second order tensor s1 and the scalar s2 are called stabil-ization parameters, have dimensions of time and will have to bedetermined.

In the present work the stabilization parameters are imple-mented, according to the denitions in [11], as

s1 c1mh2

c2juij

h

1I 18

s2 h2

c1s119

where I is the second order identity tensor, h is a characteristiclength of the element and the parameters take the values c1 = 4,c2 = 2 for linear elements.

Using Eq. (17), the concept of the small scale Greens functionprovides a closure for the large scale Eqs. (6) and (7), motivatingan approximation to the subgrid terms ~u; ~p that avoids the neces-sity to solve additional equations.

5.3. Subscale-based error estimation

From the point of view of the error estimation, as pointed out in[23,18] the approximation to the subscales is ultimately propor-tional to the elemental residual of the large-scale equations. Assuch, it is a natural choice as an error estimator within the frame-work of variational mulstiscale methods. An approach based onthis idea is analyzed in [18], where the performance of a subscalebased error estimator (using the formulation dened as ASGSabove) is studied.

The error estimator used in the examples presented in this doc-ument is dened as

ks1ru nhkkuavgh k20

where kk denotes the L2 norm and uhavg is an average large scalevelocity on the domain. This error estimator is evaluated on the ele-ment centers by the AMR routine and, if it is found to be larger thana predened tolerance, the element is rened.

6. Incompressible ow examples

To conclude we present some examples of application of therenement algorithm with the error estimation technique de-scribed on the previous section. The rst two examples have beenrun using four processes on a desktop computer, while the last 3Dcase was run using 32 cores of the Atlante cluster. The Atlante clus-ter, located at the Insituto Tnico de Canarias, comprises 84 JS21computation nodes (blades) each with two dual-core PPC970 pro-cessors at 2.5 GHz. Each blade has 8 GB RAM. Parallel communica-tions are performed over a Myrinet network.

6.1. An illustrative small-scale example

To show a simple example application of the procedure outlinedin this document, the results obtained for two-dimensional incom-pressible ow around a cylinder are presented here. This problem,taken from [12], has been solved using OSS stabilization. Let D bethe diameter of a cylinder centered on the origin of the domain[4D,12D] [4D,4D]. An inlet condition is imposed on the leftside of the domain, with a velocity such that the Reynolds numbercomputed with D is Re = 100.

Starting from a mesh of 3984 triangular linear elements, theow has been simulated during 60 s of ow, using a time step of

luids 80 (2013) 3423550.1 s. After an initial waiting phase, the AMR is started, using thesubscale based estimator introduced in the previous pages, check-ing the error every 20 solution steps for a total of 40 renement

passes (although the renement is limited to three passes over thesame area and a minimum element size is imposed, to preservemesh quality and prevent excessive renement on localized areas).

The simulation was run, for test purposes, using eight processesand distributed memory in a cluster at CIMNE. At the end of thesimulation, a nal mesh containing 11,666 elements was obtained,which is reproduced in Fig. 5. The mesh was distributed over theeight domain partitions as shown in Table 6. It can be seen thatthe rened area coincides with the region where vortices develop,which is what would be expected.

Observing the results in Table 6, it is evident that the meshrenement is not homogeneous between the domains. This is rea-sonable, in the sense that a successful renement strategy should

Table 6Initial and nal number of elements on each process.

Process 0 1 2 3 4 5 6 7 Total

Initial 496 510 501 493 494 490 505 495 3984Final 1056 1871 836 740 1884 1694 1542 2043 11666

R. Rossi et al. / Computers & Fluids 80 (2013) 342355 351Fig. 5. Initial mesh (a), rened mesh and (b) nal velocity contours (c) for the cylinder example.

concentrate the elements in critical areas, but not desirable fromthe point of view of parallel efciency, and highlights the need to

couple the renement strategy with a load balancing algorithmwhen it is used on a parallel environment.

(time 0.0 s)

(time 0.4 s)

(time 0.8 s)

(time 1.2 s)

352 R. Rossi et al. / Computers & Fluids 80 (2013) 342355(timeFig. 6. Velocity solution and rened1.6 s)mesh for the rectangle example.

6.2. Rectangular cylinder

Another example where the algorithm presented in this docu-ment was tested is the ow over a rectangular cylinder (20 1 m)with rounded corners (radius 0.3 m). The uid properties weredensity q = 1.225 kg/m3 and kinematic viscosity m = 1.46 105.The owwas dened by an incoming velocity of 20 m/s in the direc-tion of the long edges of the rectangle. After an initial waiting phasewhile the solution develops, the AMR algorithm is used every 20solution steps to rene the mesh, again limiting the maximumamount of renement passes over the same area, as well as themin-imum element size. Unlike the previous case, the time step is nowxed at run time to maintain the elemental CourantFriederichsLevy number over 10. Another difference is that, in this case, afterevery renement step some edge swapping is performed toimprove mesh quality. The results of the simulation can be seenin Fig. 6, where again it is appreciated that the renement is con-centrated on the wake of the body, as expected.

The initial mesh contains 37.829 nodes and 73.290 elements,which are increased to a total of 80.726 nodes and 159.084elements after 1.6 s of simulation.

6.3. Flow around the Silsoe cube

As a nal example, a simulation of a 3D incompressible owproblem will be presented. The geometry for this benchmark hasbeen taken from [40], where measurements of the wind owaround a 6 m cube constructed at Silsoe Research Institute arepresented.

Our simulation represents the ow around the cube when theincoming wind is perpendicular to one of its faces, and simulatesa tetrahedral domain 108 m long on the direction of the ow,48 m long in the perpendicular direction and 30 m high. In theinow boundary, placed 60 m before the center of the cube, thefollowing logarithmic velocity prole is imposed:

uz uj logzum

B 21

where uz is the longitudinal velocity at height z, u = 0.272 is thefriction velocity, m = 1.51 105 m2/s the kinematic viscosity ofair, j = 0.41 Von Krmns constant and B = 5.2. The air density isconsidered to be q = 1.225 kg/m3. A total time of 6 s has been sim-ulated, using a time step of 0.1 s.

ime

R. Rossi et al. / Computers & Fluids 80 (2013) 342355 353(a) Solution at t(b) Solution at timFig. 7. Pressure results on the midplane1s, original mesh.e 6s, refined mesh.for the ow over the Silsoe cube.

ve renement steps, the algorithm produced a domain with atotal of 5.3 millionelements.Notehowthe renedelementsare con-

A second important line of future research is the improvementof the mesh quality of the rened meshes. For example in the case

& Fof stretched or badly shaped elements it is potentially interestingto employ the ideas of the longest edge-renement techniques.

Acknowledgements

The authorswish to acknowledge the support of the SpanishMin-isterio de Ciencia e Innovacin through the E-DAMS project and theEuropean Commission through the Realtime project. In addition,the authors thankfully acknowledge the computer resources, tech-nical expertise and assistanceprovidedby theRed Espaola de Super-computacin. Nelson Lafontaine thanks toMAEC-AECIDscholarshipsfor nancial support given. J. Cotela gratefully acknowledges thecentrated near the main features of the ow, and correctly catchesthe formation of the horseshoe vortex on the front of the cube.

7. Concluding remarks

This document presents a mesh renement algorithm designedwith its use in a parallel environment in mind. The solution pro-posed is relatively simple to implement, as it is based on structuresthat are commonly provided by linear algebra libraries, such as dis-tributed sparse matrices. that will be already available in most par-allel nite element codes.

The procedure presented has three basic components. The rst ofthem is an element-driven global splitting algorithm, that identiesthe elements thatmust be subdivided and communicates this infor-mation to all processes. This component relies on an error estima-tion strategy, which identies the areas where mesh resolution isinsufcient. The third component of the algorithm is a local rene-ment procedure, which subdivides the existing elements in a waythat ensures that they will be conformant with their neighbors.

Error estimation is dependent on the physical formulation ofthe problem that is solved. This paper presents one choice of errorestimator, specic to incompressible ow problems, which hasbeen used in some simple examples. Obviously, this is just one pos-sible estimator, and other options will be more desirable for otherproblems, but the main renement algorithm is not problem-dependent, and can be used to rene any arbitrary subset of ele-ments in the domain.

An important question that has not been addressed in this doc-ument is that, when the nite element mesh is adaptively rened,new elements will be created in localized areas of the domain and,as a result, the number of elements in each parallel subdomain willchange. A crucial line of future work will be dening a strategy topreserve load balance when this happens. Another line of improve-ment is to provide a mesh coarsening strategy, to reverse therenements performed if the error is found to be sufciently smallin later time steps.The renement algorithm has been run initially after 20 simu-lation steps and every 10 steps after that point, for a total of verenement passes. Renement has been performed for all ele-ments where the error estimator evaluated a subscale velocity lar-ger than a 5% of the average large-scale velocity, limited to tworenements over a single original element.

The results of the simulation at time 1 s, with the original mesh,and time 6 s, with the rened mesh, are presented in Fig. 7. Thedomain was meshed using 1.6 million elements initially and, after

354 R. Rossi et al. / Computerssupport of the Spanish Ministerio de Educacin through a doctoralgrant in the FPU program and the Col legi dEnginyers de Camins,Canals i Ports.References

[1] Multiphysics Kratos. .

[2] Tetrahedra split routine. .

[3] Babuscka I, Miller A. A posteriori error estimates and adaptative techniques forthe nite element method. Technical teport BN-968, Institute for PhysicalScience and Technology, Univerisity of Maryland, 1981.

[4] Balman Mehmet. Parallel tetrahedral mesh renement. Master thesis. BogaziciUniversity, 2000, January 2006.

[5] Stepleman Robert S. Scientic computing: applications of mathematics andcomputing to the physical sciences. North-Holland Pub. Co; 1983.

[6] Bansch Eberhard. An adaptive nite-element strategy for the three-dimensional time-dependent NavierStokes equations. J Comput Appl Math1991;36(1):328.

[7] Barry William J, Jones Mark T, Plassmann Paul E. Parallel adaptive meshrenement techniques for plasticity problems. Adv Eng Softw 1998;29: 21725.

[8] Botkin Mark E, Wang Hui-Ping. An adaptive mesh renement of quadrilateralnite element meshes based upon an a posteriori error estimation ofquantities of interest: modal response. Eng Comput. Lond. 2004:3844.

[9] Chellamuthu KC, Ida N. Algorithms and data structures for 2D and 3D adaptivenite element mesh renement. Finite Elem Anal Des 1994;17(3):20529.

[10] Codina Ramon. Stabilization of incompressibility and convection throughorthogonal sub-scales in nite element methods. Comput Meth Appl Mech Eng2000;190(1314):157999.

[11] Codina Ramon. A stabilized nite element method for generalized stationaryincompressible ows. Comput Meth Appl Mech Eng 2001;190(2021):26812706.

[12] Codina Ramon. Stabilized nite element approximation of transientincompressible ows using orthogonal subscales. Comput Meth Appl MechEng 2002;191(3940):4295321.

[13] De Cougny HL, Shephard MS. Parallel renement and coarsening of tetrahedralmeshes. Int J Numer Meth Eng 1999;46(7):110125.

[14] Dadvand Pooyan, Rossi Riccardo, Oate Eugenio. An object-orientedenvironment for developing nite element codes for multi-disciplinaryapplications. Arch Comput Meth Eng 2010;17:25397.

[15] Gratsch Thomas, Bathe Klaus-Jurgen. A posteriori error estimation techniquesin practical nite element analysis. Comput Struct 2005;83(45):23565.

[16] Grosso Roberto, Lrig Christoph, Ertl Thomas. The multilevel nite elementmethod for adaptive mesh optimization and visualization of volume data. In:Proceedings of the 8th conference on visualization 97, VIS 97, Los Alamitos,CA, USA. IEEE Computer Society Press; 1997. p. 387ff.

[17] Hertel R, Kronmuller H. Adaptive nite element mesh renement techniquesin three-dimensional micromagnetic modeling. IEEE Trans Magnet 1998;34(6):392230.

[18] Hauke Guillermo, Doweidar Mohamed H, Miana Mario. The multiscaleapproach to error estimation and adaptivity. Comput Meth Appl Mech Eng2006;195(1316):157393 [A tribute to Thomas J.R. Hughes on the occasion ofhis 60th birthday].

[19] Daz Morcillo Alejandro, Nuo Luis, Balbastre Juan V, Hernndez DavidSnchez. Adaptative mesh renement in electromagnetic problems, In:Proceedings of the ninth international meshing roundtable. New Orleans,Louisiana; 2000.

[20] Heroux Michael, Bartlett Roscoe, Hoekstra Vicki Howle Robert, Hu Jonathan,Kolda Tamara, Lehoucq Richard, et al. An overview of Trilinos. Technical reportSAND2003-2927, Sandia National Laboratories, 2003.

[21] Jansson N, Hoffman J, Jansson J. Parallel adaptive FEM CFD. Technical reportKTH-CTL-4008, Computational Technology Laboratory, 2010.

[22] Hughes Thomas JR. Multiscale phenomena: Greens functions, theDirichlet-to-Neumann formulation, subgrid scale models, bubbles and theorigins of stabilized methods. Comput Meth Appl Mech Eng 1995;127(14):387401.

[23] Hughes Thomas JR, Feijo Gonzalo R, Mazzei Luca, Quincy Jean-Baptiste. Thevariational multiscale method a paradigm for computational mechanics.Comput Meth Appl Mech Eng 1998;166(12):324 [Advances in stabilizedmethods in computational mechanics].

[24] Hughes Thomas JR, Franca Leopoldo P, Hulbert Gregory M. A new niteelement formulation for computational uid dynamics: VIII The Galerkin/least-squares method for advectivediffusive equations. Comput MethodsAppl Mech Eng 1989;73(2):17389.

[25] Hughes Thomas JR, Mallet Michel. A new nite element formulation forcomputational uid dynamics: III The generalized streamline operator formultidimensional advectivediffusive systems. Comput Methods Appl MechEng 1986;58(3):30528.

[26] Raizer A, Meunier G, Coulomb JL. An approach for automatic adaptive meshrenement in nite element computation of magnetic elds. IEEE TransMagnet 1989;25(4):29657.

[27] Jonesa Mark T, Plassmann Paul E. Adaptive renement of unstructured nite-element meshes. Finite Elem Anal Des 1997;25(12):416.

[28] Kirk BS, Peterson JW, Stogner RH, Carey GF. libMesh: a C++ library for paralleladaptive mesh renement/coarsening simulations. Eng Comput 2006;

luids 80 (2013) 34235522(34):23754.[29] Kossaczky Igor. A recursive approach to local mesh renement in two and

three dimensions. J Comput Appl Math 1994;55(3):27588.

[30] Lian Y-Y, Hsu K-H, Shao Y-L, Lee Y-M, Jeng Y-W, Wu J-S. Parallel adaptivemesh-rening scheme on a three-dimensional unstructured tetrahedral meshand its applications. Comput Phys Commun 2006:72137.

[31] Lian Y-Y, Hsu K-H, Shao Y-L, Lee Y-M, Jeng Y-W, Wu J-S. Parallel adaptivemesh-rening scheme on a three-dimensional unstructured tetrahedral meshand its applications. Comput Phys Commun 2006;175(1112):72137.

[32] Liu Anwei, Joe Barry. Quality local renement of tetrahedral meshes based onbisection. SIAM J Sci Comput 1995;16(6):126991.

[33] Field TW, Nehl DA. Adaptive renement of rst order tetrahedral meshes formagnetostatics using local Delaunay subdivisions. IEEE Trans Magnet1991;27(5):41936.

[34] Rodriguez Ros, Gustavo A, Storti MA, Nigro NM. Adaptive renement ofunstructured nite element meshes for compressible ows. MecnicaComputacional 2009;27(5):128395.

[35] Oate E, Valls A, Garca J. Computation of turbulent ows using a nitecalculus nite element formulation. Int J Numer Methods Fluids 2007;54(68):60937.

[36] Oate Eugenio, Miquel Juan, Hauke Guillermo. Stabilized formulation for theadvectiondiffusionabsorption equation using nite calculus and linear niteelements. Comput Meth Appl Mech Eng 2006;195(3336):9263946.

[37] Oate Eugenio, Zrate Francisco, Idelsohn Sergio R. Finite element formulationfor convectivediffusive problems with sharp gradients using nite calculus.

Comput Meth Appl Mech Eng 2006;195(1316):1793825 [A tribute toThomas J.R. Hughes on the occasion of his 60th birthday].

[38] Pang Boluan. A tetrahedral renement algorithm for adaptive nite elementmethods in electromagnetics. Master thesis. McGill University, Montreal,August 2009.

[39] Pebay Philippe P, Thompson David C. Parallel mesh renement withoutcommunication. In: Proceedings 13th international meshing roundtable,Williamsbourg, USA, 2004.

[40] Richards PJ, Hoxey RP, Short LJ. Wind pressures on a 6 m cube. J Wind Eng IndAerodyn 2001;89(1415):155364.

[41] Rivara M-C. Mesh renement processes based on the generalized bisection ofsimplices. SIAM J Numer Anal 1984;21:60413.

[42] Rivara Maria-Cecilia. Mesh renement processes based on the generalizedbisection of simplices. SIAM J Numer Anal 1984;21(3):60413.

[43] Sewell EG. Automatic generation of triangulation for piecewise polynomialapproximation. Ph.D. Thesis, Purdue Univ., West Lafayette, January 1972.

[44] Zhang L Bo. A parallel algorithm for adaptive local renement of tetrahedralmeshes using bisection. Tech. rep. preprint ICM-05-09, Institute ofComputational Mathematics and Scientic/Engineering Computing, 2005.

R. Rossi et al. / Computers & Fluids 80 (2013) 342355 355

Parallel adaptive mesh refinement for incompressible flow problems1 Introduction2 Global splitting algorithm3 Splitting of simplicial elements4 Parallel Benchmark5 A simple subscale-based error indicator5.1 Variational multiscale formulation of the incompressible NavierStokes equations5.2 The small-scale Greens function5.3 Subscale-based error estimation

6 Incompressible flow examples6.1 An illustrative small-scale example6.2 Rectangular cylinder6.3 Flow around the Silsoe cube

7 Concluding remarksAcknowledgementsReferences