peligriff, a parallel dem-dlm/fd direct numerical simulation tool for 3d particulate flows

J Eng Math (2011) 71:131–155DOI 10.1007/s10665-010-9436-2

PeliGRIFF, a parallel DEM-DLM/FD direct numericalsimulation tool for 3D particulate flows

Anthony Wachs

Received: 14 April 2010 / Accepted: 2 November 2010 / Published online: 30 November 2010© Springer Science+Business Media B.V. 2010

Abstract The problem of particulate flows at moderate to high concentration and finite Reynolds number isaddressed by parallel direct numerical simulation. The present contribution is an extension of the work publishedin Computers & Fluids 38:1608 (2009), where systems of moderate size in a 2D geometry were examined. At thenumerical level, the suggested method is inspired by the framework established by Glowinski et al. (Int J MultiphFlow 25:755, 1999) in the sense that their Distributed Lagrange Multiplier/Fictitious Domain (DLM/FD) formula-tion and their operator-splitting idea are employed. In contrast, particle collisions are handled by an efficient DiscreteElement Method (DEM) granular solver, which allows one to consider both smoothly (sphere) and non-smoothly(angular polyhedron) shaped particles. From a computational viewpoint, a basic though efficient strategy has beendeveloped to implement the DLM/FD method in a domain decomposition/distributed fashion. To achieve this goal,the serial code, GRIFF (GRains In Fluid Flow; see Comput Fluids 38:1608–1628, 2009) is upgraded to fully MPIcapabilities. The new code, PeliGRIFF (Parallel Efficient Library for GRains in Fluid Flow) is developed under theframework of the fully MPI open-source platform PELICANS. The parallel computing capabilities of PeliGRIFFoffer new perspectives in the study of particulate flows and indeed increase the number of particles usually simulatedin the literature. Solutions to address new issues raised by the parallelization of the DLM/FD method and assessthe scalable properties of the code are proposed. Results on the 2D/3D sedimentation of a significant collection ofisometric polygonal/polyhedral particles in a Newtonian fluid with collisions are presented as a validation test andan illustration of the class of particulate flows PeliGRIFF is able to investigate.

Keywords Discrete Element Method · Distributed Lagrange Multiplier/Fictitious Domain method ·Parallel computing · Particulate flow · Polygonal/polyhedral shape · Sedimentation

1 Introduction

Solid/solid and fluid/solid interactions in moderately to highly concentrated particulate flows are widely encounteredin both industrial applications and environmental phenomena. The hydrodynamics of such complex flows is stillpartially understood, even if the fluid phase exhibits simple Newtonian properties and the particles are monodispersespheres (3D) or circular cylinders (2D). It is commonly accepted nowadays that the main issue in this type of flows

A. Wachs (B)Fluid Mechanics Department, IFP Energies Nouvelles, 1 & 4, avenue de Bois Préau, 92852 Rueil Malmaison Cedex, Francee-mail: [email protected]

123

132 A. Wachs

concerns the broad range of scales interacting together to form complex flow patterns as particle clustering, wavepropagation, hydrodynamics instabilities and so on. A quick and obvious inspection reveals that the smallest scaleis the one of a single solid particle whereas the largest one is often the size of the flow domain, that might easilybe a few hundred-fold larger, at least. In such flows, the consideration of coupling between fluid flow and particlesmotion can not be ignored as soon as the solid volume fraction exceeds a few percent, essentially because it is notappropriate to assume that solid particles are simply conveyed without altering the flow itself. Among all industriesthat are concerned with particulate flows, let us take as an example the Oil and Gas industry in which the numberof processes involving particle sedimentation, fluidisation or transport is fairly large.

Here, we are interested in the direct numerical simulation of large-scale particulate flows at moderate to highconcentration and finite Reynolds number Re. In the previous sentence, the term “large scale” refers to a number ofparticles significantly larger than what is currently reachable by less sophisticated implementations or other DNSmethods than ours. It is worth mentioning that, in general, the number of particles computed by the most modernDNS codes is still tremendously low compared to practical applications where millions or even billions of particlesinteract, but hopefully is getting closer to lab-scale experiments. As the concentration of solid bodies suspended inthe fluid exceeds more or less 5%, the probability of collision between particles increases dramatically. As a con-sequence, this requires the use of proper contact laws, and a model (or numerical method) to handle the numerousmulti-body collisions. Besides, as Re exceeds 1, the fluid and particle inertia can no longer be ignored and needto be incorporated in the governing equations, in order to model properly the nonlinear mechanisms controllingthe migration and rotation of particles and lacking in Stokes flow. Our objective is a parallel approach based onthe combined advantages of Distributed Lagrange Multiplier/Fictitious Domain (DLM/FD) method for fluid/solidinteractions and Discrete Element Method (DEM) for multi-body solid collisions. Our method uses the DLM/FDformulation of Glowinski et al. [1] together with their operator-splitting idea to facilitate computations (see also[2, Chaps. 8, 9]) but here the collision step is handled by the DEM solver. Our DEM solver manages particles ofarbitrary shape (at least convex) and various size. As two particles collide, the soft-sphere approach allows them toslightly overlap and collision forces are calculated based on the overlapping region. In the overlapping region, toavoid that the conflicting rigid-body motion constraints of the two colliding particles are forced at the same velocitynode, in which case the problem would be overconstrained, we advocate the same strategy as the one we employed in[3] (and originally suggested by Singh et al. [4]), i.e., we impose the constraint of the particle whose center of massis closer to that node. As already highlighted in our previous work [3], we illustrate in Sect. 5 that the simulation of2D/3D particulate flows with polygonal/polyhedral particles and actual collisions does not cause any difficulties.The overall method keeps the strong and robust convergence properties of standard DLM/FD methodologies.

The number of particles that can be reasonably simulated in a DNS approach appears to be a crucial matterof concern. Although DNS permits to get some deep insight into the fluid/solid interactions since the velocityand pressure fields are fully resolved around solid bodies, the computing time is prohibitive and in most serialimplementations, a few hundred (respectively, thousand) particles in 3D (respectively 2D) is the attainable upperbound. Using their original DLM/FD method and a serial implementation, Glowinski and co-workers reported in2001 the simulation of the sedimentation of 6,400 circular cylinders in 2D and the fluidization of 1,204 spheresin 3D in [5], and even up to 11,340 circular cylinders in 2D in [6]. Feng and Michaelides [7], in 2005, used theircode Proteus and matched the same performances by simulating the settling of 1,232 spheres thanks to an IB/LBM(Immersed Boundary/lattice-Boltzmann Method). In both these contributions, computations were performed witha serial code. Nevertheless, they showed the formation of large hydrodynamic recirculations that involve tens tohundreds of particles. In 2005, Uhlmann [8] implemented a similar IB method and studied the sedimentation of1,000 spherical particles in a periodic box with a fully MPI implementation which enabled him to consider finemeshes, up to 512 × 512× 1,024 grid nodes. Computations were run on 64 and 128 processors. With other kinds ofapproach where the flow is averagely solved, i.e., local averaging of physical properties on the fluid computationalcell that contains many particles, a larger number of particles can be simulated. For instance, Tsuji et al. [9,10]showed impressive results with up to 16 million particles, but in their approach the fluid/solid interaction reliesessentially on a drag-force correlation and is consequently not as precisely described as in a DNS method likeDLM/FD or IB.

123123

PeliGRIFF 133

Very recently, Jin et al. [11] published a remarkable work on a parallel implementation of their non-Lagrange-multiplier-based fictitious domain formulation for particulate flows [12]. The numerical strategy uses a standarddomain decomposition to distribute computations on processes and the linear algebra part of their code relies onthe parallel library PETSc. They presented impressive results on the separation of 21,136 spheres in a cuboid box,a 3D computation that had never been reported before in the literature. The finest grid run involved around 20million finite elements and 28 million velocity nodes on 80 processors. Although the dimensionless time step usedin this computation was excessively large (�t = 0.1), the time evolution of the flow seems realistic and smallersystems (around one thousand spheres) computed with a more appropriate time step yield satisfactory agreementwith macroscopic laws as Richardson–Zaki in sedimentation/fluidisation. They also reported decent scalability oftheir code. Compared to Jin et al. [11], we employ here the Lagrange-multiplier-based version of the fictitiousdomain method (DLM/FD) together with a sophisticated collision method to treat particles of complex polyhedralshape. Although the non-Lagrange-multiplier-based version is much easier to implement, our choice is motivatedby the two following reasons:

(1) to provide solutions of reasonable accuracy; in fact, it has been shown in [2] that non-Lagrange-multiplierfictitious domain methods provide less accurate solutions than their Lagrange-multiplier-based counterpart;

(2) to assess the feasibility of a parallel solution of the DLM/FD saddle-point problem.

Hence, the primary goal of this paper concerns the extension of the DLM/FD method to domain decomposition.We present later a fairly simple way of solving the saddle-point problem associated with the solution of the DLM/FDproblem together with a practical strategy to distribute particles over the sub-domains (processes) in which onlybasic costless MPI communications are necessary. Finally, we give rules on overlapping zones such that the methodto construct the set of DLM/FD points on each particle we suggested in [3] is still valid on a decomposed domain.

Being able to simulate a large number of particles is an objective in itself from the pure computational viewpointbut more importantly, it opens a new broad range of classes of problems that can be investigated with this typeof approach. In other words if, in a given situation, experiments revealed that hydrodynamic structures involvingthousands of particles manifest in the process and the code used to simulate this phenomenon cannot handle morethan a few hundred, this situation can simply not be examined. In this perspective, a fully MPI implementationrunning on large clusters is a real breakthrough. Compared to our previous work [3,13,14], we present here ournew code PeliGRIFF. This new version of our DEM-DLM/FD approach is fully MPI and enables us to extendsignificantly the number of particles present in our systems.

In Sects. 2 and 3, we briefly recall the DLM/FD formulation and the principle of our DEM solver (more detailscan be found in [3]). In Sect. 4, we sum up the computational ingredients of our numerical strategy and focuson the parallel solution of the DLM/FD saddle-point as well as the set of DLM/FD points construction methodon distributed sub-domains. In Sect. 5, we validate the 3D implementation of PeliGRIFF on a single 3D particleof spherical shape settling in a box and compare the computed results to data available in the literature. Besides,we assess the scalable properties on the sedimentation of 6,400 particles in 2D. Finally, we illustrate the originalcapabilities of our code on the settling of 4,000 cubes in 3D.

Preliminary remark: As is customary in distributed computing, we shall talk about process instead of processor.However, in practice, i.e., on the cluster, each processor hosts a single process, which implies that a 64 process runand a 64 processor run mean the same.

2 Governing equations

Let � be a bounded domain of Rd , d ∈ 〈2, 3〉 and denote by ∂� the boundary of �. Suppose that � is filled with

NP rigid particles Pi (t), i = 1, . . . , NP . For simplicity, we consider NP = 1, the extension to the multi-body casebeing straightforward. It should be noted that we shall work with dimensionless quantities throughout the wholearticle and distinguish any dimensional quantities by a “star” symbol.

123

134 A. Wachs

In the formulation below, we consider the case of mixed boundary conditions. Let us assume that ∂� can besub-divided into �0 and �1 on which velocity u∗ and pressure p∗ fields satisfy:

u∗ = u∗�0

on �0, (1)

(2η∗ D∗(u∗) − p∗ I)n∗ = g∗�1

on �1. (2)

where n∗ is the unit outward normal vector to �1, η∗ the fluid viscosity and D∗ = 1

2 (∇u∗ +∇u∗t ) the rate-of-straintensor. Governing equations can be non-dimensionalized by introducing the following scales : Lc for length, Uc

for velocity, Lc/Uc for time, ρ∗f U 2

c for pressure and ρ∗f U 2

c /Lc for rigid-body-motion Lagrange multiplier. Thevariational combined momentum equations that govern both the fluid and solid motion reads [1]:

(1) Combined momentum equations∫

�

(∂u∂t

+ (u · ∇)u)

· vdx −∫

�

p∇ · vdx + 1

Rec

∫

�

2D(u) : D(v)dx

+∫

P(t)

λ · vdx =∫

�1

g�1 · vd�, ∀v ∈ V0(�), (3)

(ρr − 1)

[VP

(∂U∂t

− Frg∗

g∗

)· V +

(IP

∂ω

∂t+ ω × IP · ω

)· ξ

]−∑

j

Fcj · V

−∑

j

Fcj · ξ × R j −∫

P(t)

λ · (V + ξ × rG M )dx = 0, V ∈ Rd, ξ ∈ R

d, (4)

∫

P(t)

α · (u − (U + ω × rG M ))dx = 0, ∀α ∈ �(t); (5)

(2) Continuity equation

−∫

�

q∇ · udx = 0, ∀q ∈ L2(�). (6)

Above, u ∈ V�0(�), p ∈ L2(�) if �1 �= ∅ (respectively, p ∈ L20(�) if � = �0), λ ∈ �(t) denotes the distrib-

uted Lagrange-multiplier vector, U ∈ Rd the particle translational velocity vector, ω ∈ R

d the particle angular-velocity vector, d the number of non-zero components of ω (if d = 2,ω = (0, 0, ωz) and d = 1, else d = d),(v, q,α, V , ξ) the test functions for (u, p,λ, U,ω) respectively, Fcj ∈ R

d the contact forces, R j ∈ Rd the vectors

between particle center of mass G and contact point, rG M the position vector with respect to particle center of massG, VP = M∗/(ρ∗

s Ldc ) ∈ R the dimensionless particle volume, M∗ the particle mass, IP = I∗

P/(ρ∗s Ld+2

c ) ∈ Rd×d

the dimensionless particle-inertia tensor, ρ∗s ∈ R the particle density, g∗ ∈ R

d the gravity-acceleration, g∗ ∈ R the

gravity acceleration modulus, Rec = ρ∗f Uc Lc

η∗ the Reynolds number, ρ∗f ∈ R the fluid density, Fr = g∗Lc

U 2c

the Froude

number and ρr = ρ∗s

ρ∗f

the density ratio.

In the above equations, we have introduced the following functional spaces:

V0(�) ={v ∈ H1(�)d |v = 0 on �0

}, (7)

V�0(�) ={v ∈ H1(�)d |v = u�0 on �0

}, (8)

L20(�) =

⎧⎨⎩q ∈ L2(�)

∣∣ ∫

�

qdx = 0

⎫⎬⎭ , (9)

�(t) = H1(P(t))d . (10)

123123

PeliGRIFF 135

Fig. 1 Contact between two particles: Gi and G j denote the centers of mass of particles i and j , respectively, M the contact point,n and t the unit normal and tangential vectors at the contact point, respectively, and δi j the overlapping distance

3 Collision model: DEM solver

Binary hard-sphere model and soft-sphere model are the two categories of collision model for particulate flows (formore detail see [15, Chap. 5]). For the hard-sphere model, the momentum exchange between two colliding particlestakes place exactly at the time when the two particles touch. In contrast, for the soft-sphere model, the velocity ofcolliding particles is determined from Newton’s equations of motion with collision forces of soft potential being afunction of separation or overlap distances between particles and possibly particles velocity [16,17], as shown inFig. 1. In our DEM granular solver, the considered collision forces comprise:

• an elastic restoring force

fel = knδi j n, (11)

where kn denotes the normal contact stiffness, δi j the overlapping distance between particles i and j and n theunit normal vector pointing between particles i and j centers of mass.

• a viscous dynamic force

fdn = −2γnmi j Urn, (12)

in the normal direction to account for the dissipative aspect of the contact, where γn is the normal dynamicfriction coefficient, mi j = Mi M j

Mi +M jthe reduced mass of particles i and j and Urn the normal relative velocity

between particles i and j .• a tangential friction force

ft = −min{μc| fel |, | fdt + fs |}t, (13)

fdt = −2γt mi j Ur t , (14)

fs = −ks

tc∫

0

Ur t dt, (15)

where fdt denotes the dissipative frictional contribution, γt the dissipative tangential friction coefficient, Urn

the tangential relative velocity between particles i and j, fs the static frictional contribution which behaves likean incremental spring that stores energy during the time of contact tc and ks the static friction coefficient. Notethat the magnitude of the tangential friction force is limited by the Coulomb frictional limit calculated with theCoulomb dynamic friction coefficient μc.

The total collision force acting on a particle i is the sum of contributions related to the contact with neighbouringparticles j :

Fci =∑

j

Fci j =∑

j

( fel + fdn + ft )i j . (16)

123

136 A. Wachs

4 Computational features

A detailed presentation of our numerical method has been provided in [3]. For the sake of conciseness, we merelysum up below the main ingredients of our strategy and focus on two specific and crucial aspects in distributedcomputing: the parallel solution of the DLM/FD saddle-point and the set of DLM/FD points construction. Theseingredients comprise:

• the solution algorithm is based on operator-splitting techniques, i.e., at each time step we solve the followingsequence of sub-problems:

(1) a degenerated Stokes problem to impose incompressibility (divergence-free velocity field) solved by anUzawa/Preconditioned Conjugate Gradient (PCG) iterative procedure;

(2) a purely advection problem treated by a Taylor–Galerkin wave-like method;(3) a purely diffusive (viscous) problem solved by a PCG iterative procedure;(4) a purely granular problem to predict particles velocity and position;(5) a DLM/FD problem to account for fluid/solid interactions solved by an Uzawa/CG iterative procedure and

thus correct particles and fluid velocity;(6) a purely granular problem to correct particles position (and possibly velocity in case of further collisions).

Compared to [3], we add the correction step 6, as suggested by Glowinski et al. [1] and used by us in [13,14]and we solve sub-problems 2 and 3 together as an advection/diffusion one when the CFL condition is fulfilledwith the constant time step employed:

• the spatial discretization is of the Finite Element type with triangular P1isoP2/P1 (Pironneau–Bercovier) andtetrahedral P2/P1 (Taylor–Hood) elements for the velocity and pressure fields in 2D and 3D, respectively;

• the spatial discretization of the distributed Lagrange multiplier field is based on the collocation-point methodthat assumes that each particle is covered by a set of points on which the test functions are Dirac measures [1,3].A special treatment is necessary as two particles collide (see [3] for more detail). We shall come back morespecifically to this issue in Sect. 4.3;

• the granular sub-problem is solved by a second-order accurate leap-frog scheme and a highly efficient linked-cellalgorithm is employed to detect particle collisions [18];

• the fluid solver is parallelized with a classical domain decomposition technique implemented in the PELICANSplatform linked with the open-source PETSc library for all matrix operations. It should be pointed out thatthe fixed-mesh property of the DLM/FD method is indeed beneficial to the whole parallelization of the code.In fact, compared to more complex techniques like the Arbitrary Lagrangian–Eulerian method which requiresregular remeshing, the DLM/FD method implies to decompose the whole computational domain once and forall, which highly facilitates the implementation. In case of rectangular (respectively cuboid in 3D) geometries,we obviously construct a structured mesh and therefore are able to decompose the whole computational domaininto sub-domains of equal size distributed on each process. Hence, we produce a perfect initial load balancingbetween the processes and this optimal load balancing is maintained throughout the total computation sincethe mesh is fixed. In case of more complex geometry or unstructured mesh, we use ParMetis to partition thecomputational domain;

• the mass matrix is always lumped and hence diagonal. This speeds up the convergence of saddle-point problemslike the Stokes and DLM/FD ones since the velocity operator need not to be inverted by an iterative method. Wehave checked that the error introduced by the lumping approximation is always very small and anyway com-pensated by the constraint of employing small time steps in the operator-splitting algorithm. As a consequence,there are basically two matrix systems to solve: a pressure Laplacian one uses as a preconditioner in the Stokessaddle-point sub-problem and a viscous one for the advection/diffusion sub-problem. The latter is clearly easyand fast to solve since, due to the use of small time steps, the operator formally acting on the fluid velocity,i.e., the sum of the unsteady and viscous operators, is highly diagonal dominant. Any basic preconditioner, likeJacobi or block Jacobi with SSOR preconditioning provided by PETSc, is adapted to supply the solution in

123123

PeliGRIFF 137

less than 20 iterations. The former, i.e., the pressure Laplacian matrix, however, is not diagonally dominant andrequires the use of an efficient preconditioner, otherwise most of the computing time is spent in solving thismatrix system in parallel. PETSc can be linked to other libraries and thus enables one to use a broad variety ofpreconditioners. Among them, parallel algebraic multigrid preconditioners of HYPRE library are particularlypowerful. After various tests, the Boomer–Amg preconditioner has proved to perform very well. Depending onthe system size and accounting for the intrinsic cost of the Boomer–Amg preconditioning step, the speed upfactor is usually between 3 and 20 compared to a basic Jacobi preconditioner, which is significant;

• the granular solver runs either in serial or in parallel mode depending on the solid-volume fraction. Usually, itscomputational cost does not exceed 20% of the whole computing time.

4.1 Distribution of particles on processes

Although the mesh is fixed in the DLM/FD method, particles move freely on it and therefore constantly switch fromone sub-domain to another, i.e., from one process to another. This particular issue that need not to be addressed inserial computations, i.e., with a non-partitioned computational domain, requires a special treatment in distributedcomputing. Needless to say that the crucial situation is actually as a particle has a geometric overlap with differentsub-domains, i.e., geometrically exists on different processes. In contrast, as a particle is fully located on a singleprocess, no special treatment is required. To handle all possible situations, i.e., both when a particle is located on asingle sub-domain and when it has intersections with many sub-domains, we define the following strategy:

(1) all sub-domains, i.e., all processes, have access to all particles features: density, position, velocity, shape andconfiguration. Particles are dynamically created and destroyed at each time step. Creation is based on dataprovided by the granular solver and destruction takes place at the end of each time step;

(2) for each particle, each process have access to the DLM/FD points that are actually geometrically located onits sub-domain. Therefore, for each particle, each process computes velocity constraints of type (5) on itsown DLM/FD points. We shall see in the next section how we sum, for each particle, the contributions of allprocesses;

(3) particles that are shared between sub-domains are tagged and handled by the master process.

In Figs. 2 and 3, we illustrate how DLM/FD points covering each particle are distributed among the processesin both situations depicted above respectively.

A first appraisal of the suggested strategy might highlight its lacks of efficiency in terms of parallelization. Infact, as the size of the problem increases, the memory load of particles on each process does not scale since eachstores data of all particles. However, this observation should be balanced by the two following remarks:

(1) even if we claim that the number of particles simulated in this work is beyond the current literature (except thework of Jin et al. [11]), this number is still limited and the corresponding memory storage of 10,000 particlesdoes not exceed a few Mo;

Fig. 2 Distribution ofDLM/FD points in the caseof a circular particle locatedon a single sub-domain:process 1 only features thewhole set of DLM/FDpoints. a Serial and b onfour sub-domains

(a) (b)

123

138 A. Wachs

Fig. 3 Distribution ofDLM/FD points in case ofa triangular particleoverlapping manysub-domains: each processfeatures a part of the set ofDLM/FD points. a Serialand b on four sub-domains

(b)(a)

(2) the sum of contributions from all processes for each shared particle implies some “all to master” communica-tions; having all particles defined on each process makes easier the implementation of these MPI communica-tions.

4.2 MPI solution of the DLM/FD saddle-point

Assuming that in the operator-splitting algorithm, the DLM/FD problem is the fourth sub-problem, it reads: findun+4/5 ∈ V�0(�), Un+4/5 ∈ R

d ,ωn+4/5 ∈ Rd and λ ∈ �(t) such that

∫

�

un+4/5 − un+3/5

�t· vdx +

∫

P(t)

λ · vdx = 0, ∀v ∈ V0(�), (17)

(ρr − 1)

[VP

Un+4/5 − Un+3/5

�t· V + IP

ωn+4/5 − ωn+3/5

�t· ξ

]

=∫

P(t)

λ · (V + ξ × rG M )dx, V ∈ Rd , ξ ∈ R

d , (18)

∫

P(t)

α ·(

un+4/5 −(

Un+4/5 + ωn+4/5 × rG M

))dx = 0, ∀α ∈ �(t). (19)

At the matrix level, the above constrained problem takes the following form:⎡⎢⎢⎣

Au M tu

AU M tU

Aω M tω

Mu MU Mω

⎤⎥⎥⎦

⎡⎢⎢⎣

xu

xU

xω

�

⎤⎥⎥⎦ =

⎡⎢⎢⎣

fu

fU

fωg

⎤⎥⎥⎦ , (20)

where Au is a N × N symmetric positive definite matrix, AU and Aω are L × L diagonal matrices, L = Npdwith Np the total number of particles, Mu, MU and Mω are M × N , M × L and M × L rectangular matrices,respectively, {xu, fu} ∈ R

N × RN , {xU , fU , xω, fω} ∈ R

L × RL × R

L × RL and {�, g = 0} ∈ R

M × RM , and

formally we have the following correspondence between vectors at the matrix level and linear functionals of testfunctions or test vectors in the weak formulation:

Auxu ⇔∫

�

un+4/5

� t· v dx, (21)

AU xU ⇔ (ρr − 1)VPUn+4/5

�t· V , (22)

123123

PeliGRIFF 139

Aωxω ⇔ (ρr − 1)IPωn+4/5

�t· ξ , (23)

Muxu ⇔ 〈α, un+4/5〉P(t) =∫

P(t)

α · un+4/5 dx, (24)

MU xU ⇔ −〈α, Un+4/5〉P(t) = −∫

P(t)

α · Un+4/5 dx, (25)

Mωxω ⇔ −〈α,ωn+4/5 × r〉P(t) = −∫

P(t)

α · (ωn+4/5 × rG M ) dx, (26)

fu ⇔∫

�

un+3/5

�t· v dx, (27)

fU ⇔ (ρr − 1)VPUn+3/5

�t, (28)

fω ⇔ (ρr − 1)IPωn+3/5

�t. (29)

when matrices and vectors are gathered as:

A =⎡⎣ Au

AU

Aω

⎤⎦ , M t =

⎡⎣ M t

uM t

UM t

ω

⎤⎦ , x =

⎡⎣ xu

xU

xω

⎤⎦, f =

⎡⎣ fu

fU

fω

⎤⎦ , (30)

the matrix system (20) can be re-written in the classical following form:[

A M t

M

] [x�

]=[

fg

]. (31)

As originally suggested by Glowinski et al. in [1] as well as in our previous work [3,14], the matrix system (20)(or equivalently (31)) is solved by an Uzawa/CG algorithm. Details of this iterative procedure in basic serial imple-mentation is given in Appendix A. We detail below the parallel version based on our particle-distribution strategy.In addition, the two other ingredients of our parallel Uzawa/CG algorithm are:

(1) solutions of systems AU xU = fU − M tU� and Aωxω = fω − M t

ω� for each particle are computed by theprocess that features the particle in the case of particles located on a single sub-domain and by the masterprocess for shared particles;

(2) matrices Mu, MU and Mω are actually never assembled and scalar products (24), (25) and (26) are com-puted in a matrix-free fashion on each particle. For each particle, the scalar product for the fluid velocity〈λ, v〉P(t) = ∫P(t) λ · vdx ⇔ M t

u� is computed on the DLM/FD points handled by each process and distrib-uted on the fluid velocity nodes located on its sub-domain (these operations are performed by PELICANS andPETSc). On the contrary, for shared particles, we need the complete scalar products 〈λ, V 〉P(t) ⇔ M t

U� and〈λ, ξ×rG M 〉P(t) ⇔ M t

ω� to invert the aforementioned systems AU xU = fU −M tU� and Aωxω = fω−M t

ω�.To perform this task, we note that:

〈λ, V 〉P(t) =∑

all procs �P(t)

〈λ, V 〉P(t), (32)

〈λ, ξ × r〉P(t) =∑

all procs �P(t)

〈λ, ξ × r〉P(t). (33)

Let us consider the translational velocity (same holds for the angular velocity). In Fig. 2, we have:

123

140 A. Wachs

〈λ, V 〉P(t),Proc 0 = 〈λ, V 〉P(t),Proc 2 = 〈λ, V 〉P(t),Proc 3 = 0,

〈λ, V 〉P(t),Proc 1 �= 0.(34)

Hence, all operations are handled by Proc 1 and no MPI communication is required. In contrast, in Fig. 3, wehave:

〈λ, V 〉P(t),Proc 0 �= 0, 〈λ, V 〉P(t),Proc 1 �= 0, (35)

〈λ, V 〉P(t),Proc 2 �= 0, 〈λ, V 〉P(t),Proc 3 �= 0.

This particle is shared and contributions from processes that own it are summed on the master process.

Finally, the extension of the Uzawa/CG algorithm described in Appendix A is fairly straightforward. Basically,it implies to add two communication steps to the algorithm:

(1) for the 2-step sequence “compute 〈λ, V 〉P(t) ⇔ M tU� and solve AU xU = M t

U�” (same holds for the angularvelocity):

• On all processes j :� for each particle i : compute hi, j ∈ R

d , hi, j = 〈λ, V 〉Pi (t),Proc j ,� for each particle i fully located on j : invert AUi xUi = hi , AUi ∈ R

d×d , {xUi , hi } ∈ Rd × R

d wherehi = 〈λ, V 〉Pi (t);

• On master process only:� for each shared particle i : compute hi = 〈λ, V 〉Pi (t) =∑ j 〈λ, V 〉Pi (t),Proc j =∑ j hi, j (this operation isachieved in MPI by a MPI_sum routine),� then compute xUi = A−1

Uihi ;

• On all processes j :for each shared particle i : send the updated value xUi from master process to all other processes (thisoperation is achieved in MPI by a MPI_broadcast routine);

(2) for the computation of the residual norm ||r|| with r = Muxu + MU xU + Mωxω − g ⇔ 〈α, u − (U + ω ×rG M )〉P(t):

• On all processes j :for each particle i : compute ri, j ∈ R

d , ri, j = 〈α, u − (U + ω × rG M )〉Pi (t),Proc j ;• On master process only:

� for each particle i : compute ri ∈ Rd , ri = 〈α, u − (U + ω × r)〉Pi (t) = ∑ j 〈α, u − (U + ω ×

rG M )〉Pi (t),Proc j =∑ j ri, j (this operation is achieved in MPI by a MPI_sum routine),

� then compute ||r|| =√∑

Pi (t) ri · ri ;

• On all processes j :for each particle i : send the updated value ||r||2 from master process to all other processes (this operationis achieved in MPI by a MPI_broadcast routine).

This strategy is easy to implement and employs only “master to all” and “all to master” communications for theparticles. It should be emphasized that the solution of systems involving matrices AU,i and Aω,i for shared particlesdoes not scale with the size of the problem or the number of processes since this step is performed on the masterprocess only. However, matrices AU,i and Aω,i are diagonal and the number of shared particles is low compared tothe total number of particles, hence there is basically nothing to solve and this step is nearly costless in the wholecomputing time. We will show in Sect. 5 that the suggested strategy exhibits decent scalable properties.

4.3 Overlapping zones and DLM/FD points strategy in case of collisions or near-collisions

In the collocation-point version of the DLM/FD method we suggest here, each particle is discretized by a setof points, distributed as uniformly as possible over the particle bulk and surface. The convergence rate of the

123123

PeliGRIFF 141

(c)(b)(a)

Fig. 4 Distribution of DLM/FD points in case of two particles in near-collisions: overlapping elements (in blue) on each sub-domainpermit to use the same construction strategy, plain circle symbols represent the constrained DLM/FD points; the square ones the non-constrained DLM/FD points used in the construction method only. a Serial, b on two sub-domains without overlapping elements and con two sub-domains with overlapping elements (For interpretation of the references to color in this figure legend, the reader is referredto the web version of the article.)

Uzawa/CG algorithm advocated in previous section for the solution of the DLM/FD saddle-point problem is linkedto the constraining matrix structure associated to the rigid-body-motion constraint, which is in turn based on theDLM/FD set of points. To guarantee a fair convergence of the Uzawa/CG algorithm while keeping a reasonableaccuracy of the computed solution, we provided in [3] rules to construct a proper set of DLM/FD points, bothfor a single particle far away from any solid wall as well as colliding or nearly-colliding particles. Although theconstruction strategy has been tested with success in 2D situations only, it is obviously still valid in 3D. However,its generalisation in distributed computing is not as straightforward as it might appear at first look, mainly due tothe fact that particles entirely located on different processes do not “see” each other, as well as particles split ondifferent processes do not “see” their DLM/FD points located on other processes, if we bluntly apply the strategyrecommended in [3].

A first naive idea would imply to share the DLM/FD points of all particles between all processes. This strategywould be highly inefficient and its scalability awfully low. The key issue to devise an efficient strategy is to observethat, actually, to still satisfy construction rules listed in [3], particles require to “see” other particles in a region ofsize the grid size beyond their boundary. To achieve this, we simply employ overlapping layers of element at theinterface between sub-domains, i.e., processes. The width of this layer is one grid size, i.e., one element, and wedefine DLM/FD points in this layer in case of possible collisions or near-collisions. The purpose of these DLM/FDpoints located in overlapping zones (depicted by a square symbol (�) in Fig. 4) is simply to allow fulfilling, perprocess, i.e., without any additional communication, the construction method suggested in [3]. Once the constrainedDLM/FD points (depicted by a plain circle symbol (•) in Fig. 4) are defined, no constraint is applied on the ones inoverlapping zones.

Figure 4 illustrates how the use of overlapping elements at the interface between sub-domains proves beneficialto the determination of the DLM/FD points in parallel. Figure 4a highlights how our strategy works in serial, i.e.,without domain decomposition. Without overlapping elements, 2 near-colliding or colliding particles located on 2sub-domains cannot “see” each other and the set of points determined in parallel does not match the one definedin serial. Indeed, the union of the sets of DLM/FD points on processes 0 and 1 in Fig. 4b differs from the set ofDLM/FD points defined in serial in Fig. 4a. On the contrary, Fig. 4c shows that, with overlapping elements, theunion of the sets of DLM/FD points on processes 0 and 1 match the set of DLM/FD points defined in serial inFig. 4a.

Although this section might sound too practical to some, we wish to underline that the parallel constructionmethod suggested above and its ability to supply exactly the same set of DLM/FD points as in serial is crucial to thewhole computation because: (i) it ensures that the code produces exactly the same results in serial and in parallel,(ii) it scales well with the number of particles as well as the number of processes.

123

142 A. Wachs

5 Results

We focus here on sedimentation problems both in 2D and 3D, implying that solid particles are heavier than thesurrounding fluid. In addition, we assume that the fluid is Newtonian and that the flow is isothermal (see our previouswork on the flow of particles in a non-Newtonian fluid [14,19] or with heat transfer [13]). However, both smoothlyshaped (circular cylinder in 2D and sphere in 3D) and angular (polygon in 2D and polyhedron in 3D) are consideredin the following simulations.

Our primary objective is to show that our parallel DEM-DLM/FD solution method is well suited to tackle partic-ulate flows with a significant number of particles in the flow domain. We demonstrate that computed solutions proveto be of reasonable accuracy, although the mesh around particles is not body-fitted, unlike in some other fictitiousdomain methods. We study the scalability of the algorithm with the number of processes for a large number ofparticles, a property essential to perform efficient large-scale computations. We also show that, thanks to our DEMsolver, we are able to properly treat collisions between particles of arbitrary (at least convex) shape.

To account for particle shape, we use Haider and Levenspiel’s sphericity concept [20]:

= S∗s

S∗p, (36)

where S∗s and S∗

p stand for the sphere and polyhedron boundary areas, respectively, assuming that the sphere andpolyhedron volumes are identical.

In 2D, as in [3], we introduce a similar concept, the circularity:

= P∗d

P∗p, (37)

where P∗d and P∗

p stand for the disk and polygon perimeters, respectively, assuming that the disk and polygon areasare identical.

In the sedimentation problem, the motion of solid bodies is driven by the density difference with the fluid. Theissue of choosing scales of velocity Uc and length Lc to non-dimensionalize results is not straightforward. Since inall cases considered in this paper inertia is non-negligible, we suggest to estimate Uc based on a balance betweeninertia and buoyancy [3,21]. We choose for all particle shapes Lc = D∗

e , where D∗e is the diameter of a sphere

(respectively circular cylinder) having the same volume (respectively area) as the particle. Therefore, we have:

in 2D : ρ∗f D∗

e U 2c0

2= π D∗2

e

4

(ρ∗

s − ρ∗f

)g∗ ⇒ Uc0 =

√√√√π D∗e

2

ρ∗s − ρ∗

f

ρ∗f

g∗, (38)

in 3D : ρ∗f π D∗2

e U 2c0

8= π D∗3

e

6

(ρ∗

s − ρ∗f

)g∗ ⇒ Uc0 =

√√√√4D∗e

3

ρ∗s − ρ∗

f

ρ∗f

g∗. (39)

For multi-body sedimentation problems, we correct the velocity scale Uc by the Richardson–Zaki law for hinderedsettling and thus we get:

Uc = Uc0(1 − φ)4.5, (40)

where φ is the solid surface/volume fraction.

5.1 Large scale computations in 2D

Our method and code has been convincingly validated in 2D for both single- and multi-particle cases in [3] andwe investigate here the collective behavior of a large number of settling particles. Our reference case is the oneinvestigated by Glowinski et al. in [5]: 6,400 circular cylinders settling in a 96 × 144 closed box at Re 100 andφ = 0.36. In order to assess the performances of our parallel simulation method, we re-compute a similar case as

123123

PeliGRIFF 143

in [5]. In all cases, we employ a structured mesh with h = 1/16, i.e., 16 velocity grid nodes on the diameter ofthe particle. The dimensionless time step is set to �t = 2.10−3. Our goal is threefold: (i) compare the computingtime of our code in parallel to the one needed by the serial code in [5] in 2001, (ii) evaluate the scalability of oursolver in a case of moderately high solid surface fraction and (iii) show that, beyond the current literature, we areable to perform large scale computations for angularly shaped particles (squares here). With h = 1/16, the meshcomprises 1,769,472 elements, for a total of 886,657 degrees of freedom (dof) for the pressure field and 7,085,570for the velocity field.

We perform runs with 1, 2, 4, 8, 16 and 32 processes. As a preliminary remark, we wish to point out that forpractical reasons, these computations are run on a cluster where we share multi-threaded nodes with other appli-cations and that the figures given below should not be taken as very sharp, they simply give an indication of thescalability of the code. Besides, scalability surveys are very sensitive to the following features:

• the architecture of the cluster: single- or multi-threaded, Ethernet or Infiny Band communication protocol, cachememory level of processors, …;

• the MPI library (here we use the free distribution OpenMPI);• the type of problem;• the load balancing between processes;• . . . .

For this test case, we purposely devise a situation where load balancing is not an issue. In fact, we employ a struc-tured, constant-grid-size mesh and decompose the domain in vertical rectangles of height the total height of thedomain and width the total width of the domain divided by the total number of processes. Besides, the solid fractionis set high enough (φ = 0.36) such that the solid–solid interactions and DLM/FD problems have a significantimpact on the total computing time. Other scalability tests not reported here for the sake of conciseness with smallersolid fraction enabled us to assess the parallel performances of the fluid solver alone.

In [5], the computational features of the problem are clearly supplied to the reader: grid size, time step andcomputing time. However, physical parameters are not very clear, in particular because units are not providedwith dimensional parameters like viscosity. This prevents us from exactly reproducing the same physical situation.Therefore, we decided to conform to the computational features and set the physical parameters such that Rec,0 =1,241 and Rec = 166. These values, higher than in [5], give rise to stronger and larger hydrodynamic instabilities,as we shall see below. The scalable properties of PeliGRIFF are assessed with circular particles ( = 1). Then wecompute a case with square particles ( = 0.886) to illustrate the effect of shape on the flow pattern.

Firstly, let us compare the performance of the 2001 code of Glowinski et al. [5] to the one of our MPI basedcode PeliGRIFF. As mentioned in [5], the computing cost per time step was approximately 10 min on a DEC-alphaworkstation. On our 32 processes run, the computing cost per time step of PeliGRIFF is around 20 s, resulting in aratio of around 30. Accounting for the progress made on the hardware and the scalable properties of our code thatmight still be enhanced (the scaling factor being 0.5 with 32 processes), our MPI-based code has given us the abilityto speed up the computations by a factor in the range 10–20, an indeed considerable progress. In total, Glowinskiet al. [5] needed a full week per time unit, i.e., around 13 weeks to simulate the 13 time units of the whole processwhereas we perform around 60 time units in 4 days.

We now analyse in detail the performance of our code PeliGRIFF. Figures 5 and 6 plot the scaling factor bysub-problem and the total scaling factor, respectively. The scaling factor χ is used as a measure of the parallelefficiency and is usually defined as:

χ = timeserial

N · timeN(41)

where timeserial denotes the computing time on one process (serial run), N the number of processes and timeN thecomputing time on a N process run. A first glimpse at these plots indicates that the parallel efficiency drops quicklyfor a low number of processes (between 2 and 8) and then seems to reach a plateau at 0.5. The outcome is mixed. Onthe negative side, the overall parallel efficiency, although not very low, is around 0.5 for large-scale computations. Onthe positive side, we do not see any collapse of the parallel efficiency for a large number of processes, as underlined

123

144 A. Wachs

(a)

(c) (d)

(b)

Fig. 5 Scalable properties of PeliGRIFF on a 6,400 circular cylinders settling process with around 7 million dof for the velocity field,for the different sub-problems solved at each time step. a DEM solver, b advection–diffusion problem, c Stokes problem and d DLM/FDproblem

(a) (b)

Fig. 6 Scalable properties of PeliGRIFF on a 6,400 circular cylinders settling process with about 7 millions dof for the velocity field,total computing time. a Serial time as reference and b eight processes time as reference

in Fig. 6b. This is often the case with wrongly devised parallel strategies. We believe that this preliminary test onthe scalable properties of our code is promising in the sense that the strategy recommended here performs decentlyfor a large number of processes. It also proves that the use of a parallel Uzawa/CG algorithm to solve the DLM/FDsaddle-point problem is feasible in the context of DNS of particulate flows. In detail, Fig. 5d reveals that χ saturatesat 0.6, which is deemed correct, and Fig. 5c shows that the Stokes problem is the one that scales the worst. To becomplete on this scalability test, let us mention that the DEM solver uses ∼10%, the advection–diffusion problem∼20%, the Stokes problem ∼25% and the DLM/FD problem ∼35% of the total computing time, the rest being

123123

PeliGRIFF 145

miscellaneous tasks like result outputs. The lack of perfect scalability may be attributed to very assorted reasons,among them:

• our programming of the code, which apparently might be enhanced;• cluster architecture-related issues, where obviously we cannot do very much;• the efficiency of the PETSc linear solvers and routines, which are seemingly responsible, at least partly if not

mainly, for this not so great performance. For all linear systems, convergence is deemed to be achieved as theresiduals norm is less than 10−10. The pressure Laplacian problem arising in the Stokes sub-problem is precondi-tioned by a Boomer–Amg multigrid method, as mentioned in Sect. 4. Although the use of such a preconditioneris beneficial to the overall computing time, the HYPRE implementation employed in our code does not seemto scale very well. About the solution of the advection–diffusion sub-problem, we cannot offer any valuableargument to explain why it does not scale better. In fact, we usually employ a simple Jacoby preconditioner, i.e.,a diagonal one, and the matrix is highly diagonally dominant, such that we expected to get an almost perfectscalability, i.e., χ close to 0.9–1. This is actually not the case and here we suspect the PETSc implementationto be the primary cause.

We highlight the flow features of the settling process of square particles ( = 0.886) in Figs. 8–10. As in [5],particles are initially set at the top of the box in a 80 × 80 regular lattice. Let us define the following Reynoldsnumbers:

• Re(t), the mean Reynolds number over all particles as a function of time:

Re(t) = 1

6400

6400∑i=1

Rei (t); (42)

• Re, the mean Reynolds number over all particles and the whole process:

Re = 1

Ts

Ts∫

0

Re(t), (43)

where Ts is the total duration of the process;• Remax, the maximum mean Reynolds number over all particles during the whole process:

Remax = maxt∈[0:Ts ]

Rei (t); (44)

• Remax, the maximum Reynolds number over all particles during the whole process:

Remax = maxt∈[0:Ts ]

maxi∈[1:6400] Rei (t). (45)

The time evolution of the particles position together with the pressure contours is shown in Fig. 7. From Fig. 10,we have Re ∼ 200 and Remax ∼ 600. However, some particles reach a very high velocity at some point in theprocess, in particular those conveyed in the large vortices that manisfest at t = 16.66, as shown in Fig. 8d, suchthat Remax ∼ 5,000 for square particles and Remax ∼ 10,000 for circular ones. Obviously, t = 16.66 is the timethat exhibits the highest velocity since it coincides with the break up of the pack of particles and the developmentof large recirculation cells. In our computations, the hydrodynamic instabilities created by the particles motionare much larger than the ones observed by Glowinski et al. [5] and Pan et al. [6], in agreement with the largerReynolds number considered here. In dimensional units, this might represent particles of diameter 1 cm settling inwater (whose viscosity is μ = 0.001 Pa.s). The magnitude of the various representative Reynolds numbers in thissimulation as well as the large vortices observed indicate that the settling process mimics a kind of turbulent mixingphenomenon. This is all the more emphasized by the box size based Reynolds number Rebox, which is, based onthe box width, simply 96 times larger than Re, i.e., equal to ∼20,000.

Many pieces of information can be extracted from such a simulation, among them, we wish to make the followingremarks:

123

146 A. Wachs

(a) (b)

(e) (f)

(c) (d)

(g) (h)

Fig. 7 Settling of 6,400 square particles in a rectangular enclosure at (Rec, ρr , φ, ) = (166, 1.1, 0.36, 0.886): pressure contours(from blue (lowest pressure) to red (highest pressure) over the whole process) and particles position. a t = 0, b t = 3.33, c t = 8.33,d t = 16.66, e t = 25, f t = 33.3, g t = 50 and h t = 66.6 (For interpretation of the references to color in this figure legend, the readeris referred to the web version of the article.)

• as in our previous work [3], we believe it is fair to point out that the initial layout of the particles forces aRayleigh–Taylor-like instability to manifest, at this level of Reynolds number. In fact, the whole pack of parti-cles can actually settle only if it breaks and/or destabilizes. Starting from a homogeneous layout of particles, asshown in [3], would supply a different flow pattern;

• it is worth mentioning that, as expected, the eyes of large vortices coincide with low-pressure and low-solid-concentration regions. This is indeed visible at t = 16.66 and t = 33.3 in Fig. 8d–Fig. 7d and Fig. 8f–Fig. 7frespectively. Low solid concentration in the eye of a vortex stems from the swirling-like flow pattern whichtends to eject any object out of it, as is common in vortex dynamics;

• in a multi-scale framework, we purposely did not plot the particles position in Fig. 8. In fact, we believe thatthe ultimate goal of large-scale particulate flow computations implies not to look at individual particles motionanymore but to analyse the flow in terms of the big structures that involve hundreds to thousands of particles, i.e.,at a higher scale. This simulation is an illustration of how to fulfil the gap between microscopic and mesoscopic(or even macroscopic) scales;

• we did mention above that our computed solution is of reasonable accuracy. It has already been shown on sin-gle-particle simulation in our previous work [3,14,19] and by other contributions [5] that the DLM/FD coupling

123123

PeliGRIFF 147

(a) (b)

(e) (f)

(c) (d)

(g) (h)

Fig. 8 Settling of 6,400 square particles in a rectangular enclosure at (Rec, ρr , φ, ) = (166, 1.1, 0.36, 0.886): velocity vector andmagnitude (from blue (0) to red (maximum velocity over the whole process)). a t = 0, b t = 3.33, c t = 8.33, d t = 16.66, e t = 25,f t = 33.3, g t = 50 and h t = 66.6 (For interpretation of the references to color in this figure legend, the reader is referred to the webversion of the article.)

method provides reliable solutions but we show it once more in Fig. 9, by zooming on a sub-region of the flow.Also in Fig. 9 is plotted the DLM/FD points representing the particles on the grid, emphasizing its finenesswhich guarantees reasonable accuracy;

• Figure 10 compares the time evolution of the mean vertical velocity of the suspension between circular = 1and square = 0.886 particles. As already observed for a lower-Reynolds-number settling process in [3], thetotal settling time of angular particles like squares is slightly higher than the one for circular particles. It is,however, noticeable that the break up of the circular particles leads to higher-velocity under- and over-shoots,implying that circular particles create larger instabilities. This requires further clarification.

5.2 3D validation test: a single sphere settling steadily in a cuboid box

In this validation test, we examine the steady settling of a single sphere in a cuboid box. This situation has beeninvestigated by ten Cate et al. [22] both experimentally and numerically with a LBM. It was later used by Veeramaniet al. [12] as well as Feng and Michaelides [23] to validate their own particulate-flow code. Indeed, this experiment

123

148 A. Wachs

Fig. 9 Settling of 6,400 square particles in a rectangular enclo-sure at (Rec, ρr , φ, ) = (166, 1.1, 0.36, 0.886): zoom on aregion of the flow at t = 16.6 showing the velocity vectorand its magnitude contours (top right), and the correspondingDLM/FD points on the grid representing the particles (bottomright)

Fig. 10 Settling of 6,400 particles in a rectangular enclosureat (Rec, ρr , φ) = (166, 1.1, 0.36): time evolution of the meanvertical velocity for circular = 1 and square = 0.886particles

was designed for code validation and is clearly a good candidate for the following two reasons: (i) the diameter tobox size ratio is small, therefore the mesh associated to the flow domain contains a reasonable number of elementsand (ii) the time scale of the process is short, hence a limited number of time steps is required to simulate thecomplete settling process.

The range of Reynolds number considered is Re ∈ [1.5, 31.9], thus the particle, released at the center of thehorizontal X −Y cross-section of the box, settles steadily downwards with a constant X −Y position. As suggestedby Veeramani et al. [12], it is recommended to benefit from this property of constant cross-sectional position bybuilding a mesh with a central column of width 2D∗

e and relaxing the mesh towards the vertical solid walls of thebox in order to save unnecessary fine grid in these regions of the flow domain. Computations are performed on amesh with h = 1/16 in the central column, implying that we use 16 velocity grid nodes on the particle diameter.The total number of elements in the mesh is 554,880, resulting in 99,225 and 2,299,563 dof for the pressure andvelocity fields respectively. Figure 11 shows the geometry together with the mesh in vertical Y − Z and horizontalX − Y cross-sectional plans. Note that in the horizontal cross-section, the mesh is built symmetric in each quadrantwith respect to the diagonal in order to preserve symmetry (as a result, we also ensure symmetry with respect tothe X - and Y -axis) and prevent any artificial transversal motion of the particle.

The test case is well documented in [22] and we briefly recall below its dimensional features:

• solid sphere: D∗e = 15 mm and ρ∗

s = 1,120 kg/m3;• box dimension: width × depth × height = 100 × 100 × 160 mm3;• initial sphere position: X0 = Y0 = 50 mm, Z0 = 120 mm.

In the experiment, the fluid properties (density and viscosity) are varied to yield four Reynolds numbers. In [22], adifferent velocity scale than (39) is employed. In fact, Uc is derived from the following drag coefficient correlation:

Cd,c = 24

(9.06)2

(9.06√

Rec+ 1

)2

, Rec = ρ∗f D∗

e Uc

η∗ , Cd,c =π D∗3

e6

(ρ∗

s − ρ∗f

)g∗

12ρ∗

fπ D2

e4 U∗2

c

. (46)

Figure 12 presents a comparison of our computed results with the experimental data of ten Cate et al. [22] forthe four Reynolds numbers investigated Rec = 1.5, 4.1, 11.6 and 31.9. We also plot the recent results of Feng

123123

PeliGRIFF 149

Fig. 11 Single spherical particle settling in an enclosure, comparison with the experimental data of ten Cate et al. [22]: a geometry,b mesh in a Y − Z cut plane, c mesh in a X − Y cut plane

Fig. 12 Time evolution ofvertical settling velocity of asingle sphere in anenclosure at variousdifferent Reynolds numbersRe, comparison withexperimental data of tenCate et al. [22] andnumerical results of Fengand Michaelides [23]

and Michaelides [23] who employed an IB/LBM with a special treatment of the no-slip boundary condition at theparticle surface in order to provide very accurate results. Overall, our computed results agree well with experimentaldata, the largest discrepancy being of the order of 5%. Our results are as accurate as the one of Veeramani et al.(see [12, Figs. 2, 3]) but not as satisfactory as Feng and Michaelides’ [23]. However, we consider that the obtainedagreement is a convincing validation test and that our implementation is reliable enough to investigate collectivebehavior of 3D suspension flows.

5.3 Large-scale computations in 3D

We further use our code to compute 3D sedimentation flows both with spheres and angular particles (here cubes).We consider the settling of 4,000 particles in a 12 ×12×48 cuboid box. The solid volume fraction is set to φ 0.3and the Reynolds number to (Rec,0, Rec) = (144, 29). We construct a h = 1/10 uniform structured mesh that com-prises 5,184,000 elements, resulting in 896,761 and 21,126,963 dof for the pressure and velocity field respectively.Computations are run with �t = 3.10−3 on the two following situations:

(1) with spheres = 1, particles are initially set homogeneously in the flow domain as a regular lattice;(2) with cubes = 0.806, particles are initially located at the top of the box as a dense pack.

123

150 A. Wachs

(a) (b) (c) (d) (e)

Fig. 13 Settling of 4,000 spheres in a 12 × 12 × 48 cuboid box at (Rec, ρr , φ, ) = (29, 1.5, 0.3, 1): particles position and velocitymagnitude in the 3D domain and a Y − Z cut plane located at X = 5.4. a t = 0, b t = 3.63, c t = 7.26, d t = 10.89 and e t = 14.52

Both cases are run on 64 processes and the average computing cost per time step is around 1 min. Around 10,000time steps are required to complete the settling process, hence each run takes around a full week on our Linuxcluster. The time evolution of the settling process is presented in Figs. 13 and 14 for spheres and cubes, respectively.In both cases, we plot the particles position evolution in the 3D domain and velocity contours in a cut plane.

For the spheres = 1, we show in the Y − Z cut plane located at X = 5.4 contours of the velocity magnitude.We highlight that the hydrodynamic coupling breaks the initial particles structured layout while the fluid essentiallyflows upwards through the particles as through a porous media. The high concentration φ = 0.3 and the initiallayout constraint their transverse motion and no particular large scale hydrodynamic structure is visible in the flow.Essentially, particles settle with a limited transversal motion until they form a pack at the bottom of the box. Thetheoretical maximum packing of monodisperse spheres is φmax = 0.68, hence with φ = 0.3, it means that the finalpack of particles at the bottom of the box occupies at least 0.3/0.68 0.45 of the box height. Figure 13e indicatesthat particles occupy slightly more than this lower bound as expected since the final configuration of particles isobtained as a result of the settling process. This is in accordance with experimental observations that underline thatthe maximum packing cannot result from a non-controlled sedimentation. In fact, the theoretical maximum packingfraction φmax = 0.68 corresponds to a crystalline tetrahedral network where all particle centers of mass are locatedat the nodes of the network.

123123

PeliGRIFF 151

(a) (b) (c) (d) (e)

Fig. 14 Settling of 4,000 cubes in a 12 × 12 × 48 cuboid box at (Rec, ρr , φ, ) = (29, 1.5, 0.3, 0.806): particles position and verticalvelocity component in the 3D domain and a Y − Z cut plane located at X = 2.5. a t = 0, b t = 7.26, c t = 14.52, d t = 21.78 ande t = 36.3

For the cubes = 0.806, we show in the Y − Z cut plane located at X = 2.5 contours of the vertical velocitycomponent. The initial particles layout is top-packed, thus forcing a large scale hydrodynamic instability to develop.At the early transients, the lower part of the pack somehow plummets while the upper part settles more slowly asthe fluid flows upwards through the pack. As the first particles reach the bottom of the box, a large vortex manifestswith particles settling downwards on the right and fluid flowing upwards on the left. Simultaneously, the upperparticles that still settle as a pack up to this time destabilize. The large recirculating vortex is strong enough to liftsome particles, although they are heavier than the fluid on the left, as emphasized by the velocity vectors drawn inblack in Fig. 14. As all particles on the right have packed at the bottom of the box, the recirculation fades, particlesstill suspended settle down and the system returns to rest. This test case proves that our simulation method is able todescribe a strong hydrodynamic coupling between the dispersed solid phase and the suspending fluid, as evidencedby the formation of a large recirculating vortex and the destabilisation (breaking) of the pack of particles. We alreadyshowed this phenomenon in our previous work [3] but in 2D systems with less particles. Qualitatively, the settlingprocess is similar. However, the primary difference concerns the onset of the hydrodynamic instability and the packbreaking. In fact, in 2D, for obvious geometric reasons, the fluid cannot flow through the pack of particles and theonset of the hydrodynamic instability leading to the pack breaking occurs during the first stages of the process.

123

152 A. Wachs

In contrast, in 3D, the fluid is able to flow through the pack of particles, giving some clues on why upper particleskeep on settling as a pack for a few time units and its breaking is postponed to much later compared to the 2D case.

6 Discussion and perspectives

We have suggested a parallel simulation method to examine particulate flows for a wide range of solid volumefraction and Reynolds number. Our method is an extension of our previous work [3] to parallel capabilities. In termsof numerical methods, we keep the same ingredients, i.e., a DEM solver to treat solid/solid contact dynamics and aDLM/FD method for the hydrodynamic coupling. However, we showed that the fully parallel implementation raisessome new issues for the solution of the DLM/FD problem. To achieve this goal, we detailed how to parallelize theUzawa/CG iterative method and to construct the DLM/FD set of points to guarantee that the code produces the samesolution as in serial computations. Our construction strategy has proved to be efficient and the scalable propertiesof the whole method are deemed to be correct at this point, although they can be further improved, indeed.

The use of a parallel solution method requires to assess its efficiency from a corresponding parallel viewpointwhere the primary concern is the scalable properties. We believe that the DLM/FD method is a good candidatesince its fixed-mesh feature enables one to partition and distribute the fluid computational domain once and for allat the start of the computation, and in most situations it is nearly equally distributed among the processes, avoidingto address load balancing problems. This is, however, not the case for the particles that travel from one sub-domain,i.e., process, to another. For instance, in the 4,000 cubes settling process, particles are initially all located at the topand eventually all at the bottom, illustrating that the load balancing of the particles over the processes is not optimal.In some specific geometries, there might be a way to overcome this problem (in the 4,000 cubes settling process,one can partition the domain in the X − Y plane only for instance) but in general it exists. We have not tackled ityet. However, we estimated the scalable properties of our code, PeliGRIFF, on a 2D case involving 6,400 particlesand showed that its scaling factor stabilizes at 0.5 for a large number of processes. This preliminary study on thescalable properties is a first indication that should, however, be put into perspective as the problem features as wellas the cluster architecture itself significantly affect the outcome. We are currently examining further this aspect byconsidering different problems and running PeliGRIFF on other clusters.

Nevertheless, as it is, PeliGRIFF has supplied results that have never been reported elsewhere in the literature. Infact, results on the settling of angularly shaped, cubes in our case, particles in 3D are, to the best of our knowledge,totally new. This opens up a broad new range of multiphase applications that can be tackled by DNS. Althoughour method seems well suited for this type of flows, we believe it is fair to mention that other approaches alsohave a great potential for parallel DNS of particulate flows, like IBM (see for instance the work of Uhlmann [8]).Essentially, we have demonstrated in this work the feasibility of large-scale parallel computations of particulateflows. Quantitative validations in single-particle cases confirm the reasonable accuracy of the computed solutionsbut only qualitative assessment for a large collection of particles has been provided. We believe it is now necessaryto quantitatively validate our code on true suspension flows, in which the collective behavior dominates. At thecomputational level, only runs on at most 64 processes have been performed. With larger computing facilities,systems with O (100,000) in 2D and O (10,000) in 3D are deemed to be a reasonable objective.

Finally, we wish to emphasize the fact that, although the extension to distributed computing offers new per-spectives, the computing capabilities of PeliGRIFF, as well as those of similar methods published in the literature(see [8] or [11]), are still limited to simulate real-life systems, involving millions of particles. At best, we are nowable to model lab-scale experiments, involving tens of thousands of particles. This gives researchers a new toolthat was not available previously, to better understand the collective behavior of particulate flows and to eluci-date the intricate mechanisms involved. But even with ten- or a hundred-fold more powerful computing facilities,the parallel DNS of a million particles suspended in a fluid remains out of reach. Therefore, we believe that thistype of simulation method needs to be integrated in a global multi-scale approach. New insight gained from resultsobtained with a DNS tool at the micro level would serve to enhance closure laws at the macro level and thus improvethe predictions of more conventional simulation methods for real-life systems. We strongly believe that the fully

123123

PeliGRIFF 153

integrated multi-scale approach with 3 levels: DNS [3,7,8,11] at the microscopic level, micro/macro methods liketwo ways Euler/Lagrange [9,10] at the mesoscopic level and Continuum Mechanics/closure laws at the macroscopiclevel is the path to follow in the future to extend our understanding of multiphase flows. A new challenge opens upfor mathematical engineers: how to integrate at one level the knowledge supplied by the lower level?

Acknowledgements The author gratefully acknowledges Prof. Feng, University of North Texas, USA, and Prof. Michaelides, Uni-versity of Texas at San Antonio, USA, for providing data for comparison purposes. The author wishes to thank the reviewers for theirvaluable comments, and in particular Reviewer 2 for his careful reading of the mathematical formulations and equations which helpedto sharpen up this article. Finally, the author thanks Dr. Guillaume Vinay, IFP Energies Nouvelles, France, for his sharp remarks andhelpful suggestions that contributed to improve this article.

Appendix A: Uzawa/conjugate-gradient algorithm

Below the Uzawa/CG algorithm applied to the constrained problem (20)–(30) is described. Let us denote by ε (>0)

the convergence criterion and by k the iteration index. For notational convenience, we define N + 2L by Q.

� Initialisation (k = 0)

(1) Given �(0) ∈ RM , compute q ∈ R

Q , i.e., qu ∈ RN , qU ∈ R

L , qω ∈ RL , such that:

q = f − M t�(0) ⇒⎧⎨⎩

qu = fu − M tu�(0)

qU = fU − M tU�(0)

qω = fω − M tω�(0);

(A.1)

(2) Find x(0) ∈ RQ , i.e., x(0)

u ∈ RN , x(0)

U ∈ RL , x(0)

ω ∈ RL , such that:

Ax(0) = q ⇒

⎧⎪⎨⎪⎩

x(0)u = A−1

u qu

x(0)U = A−1

U qU

x(0)ω = A−1

ω qω;(A.2)

(3) Compute the residual r(0) ∈ RM , i.e., y ∈ R

M , yu ∈ RM , yU ∈ R

M , yω ∈ RM , such that:

r(0) = −M x(0) + g, (A.3)⎧⎪⎨⎪⎩

yu = Mux(0)u ∈ R

M

yU = MU x(0)U ∈ R

M

yω = Mωx(0)ω ∈ R

M ,

⇒ y = yu + yU + yω (A.4)

r(0) = g − y; (A.5)

(4) Set w(0) ∈ RM , such that:

w(0) = r(0); (A.6)

� Iterative procedure: for k ≥ 1, while ||rk−1||〈ε do :

(1) Compute q ∈ RQ , i.e., qu ∈ R

N , qU ∈ RL , qω ∈ R

L , such that:

q = M tw(k−1) ⇒⎧⎨⎩

qu = M tuw(k−1)

qU = M tUw(k−1)

qω = M tωw(k−1);

(A.7)

(2) Find t ∈ RQ , i.e., tu ∈ R

N , tU ∈ RL , tω ∈ R

L , such that:

At = q ⇒⎧⎨⎩

tu = A−1u qu

tU = A−1U qU

tω = A−1ω qω;

(A.8)

123

154 A. Wachs

(3) Compute y ∈ RM , i.e., yu ∈ R

M , yU ∈ RM , yω ∈ R

M , such that:

y = −M t ⇒⎧⎨⎩

yu = −MutuyU = −MU tUyω = −Mω tω;

⇒ y = yu + yU + yω (A.9)

(4) Compute α as:

α = r(k−1) · r(k−1)

w(k−1) · y; (A.10)

(5) Update �(k) ∈ RM and r(k) ∈ R

M via:

�(k) = �(k−1) − αw(k−1), (A.11)

r(k) = r(k−1) − α y; (A.12)

(6) Compute ||rk ||;(7) Update x(k) ∈ R

Q , i.e., x(k)u ∈ R

N , x(k)U ∈ R

L , x(k)ω ∈ R

L via:

x(k) = x(k−1) + α t ⇒

⎧⎪⎨⎪⎩

x(k)u = x(k−1)

u + α tux(k)

U = x(k−1)U + α tU

x(k)ω = x(k−1)

ω + α tω;(A.13)

(8) Compute β as:

β = ||r(k)||2||r(k−1)||2 ; (A.14)

(9) Update w(k) ∈ RM as :

w(k) = r(k) − βw(k−1). (A.15)

References

1. Glowinski R, Pan TW, Hesla TI, Joseph DD (1999) A distributed Lagrange multiplier/fictitious domain method for particulateflow. Int J Multiph Flow 25:755–794

2. Glowinski R (2003) Finite element methods for incompressible viscous flow. In: Ciarlet PG, Lions JL (eds) Handbook of numericalanalysis, vol IX. North-Holland, Amsterdam, pp 3–1176

3. Wachs A (2009) A DEM-DLM/FD method for direct numerical simulation of particulate flows: sedimentation of polygonalisometric particles in a Newtonian fluid with collisions. Comput Fluids 38(8):1608–1628

4. Singh P, Hesla TI, Joseph DD (2003) Distributed Lagrange multiplier method for particulate flows with collisions. Int J MultiphFlow 29:495–509

5. Glowinski R, Pan TW, Hesla TI, Joseph DD, Periaux J (2001) A fictitious domain approach to the direct numerical simulation ofincompressible viscous flow past moving rigid bodies: application to particulate flow. J Comput Phys 169:363–426

6. Pan TW, Joseph DD, Glowinski R (2001) Modelling Rayleigh–Taylor instability of a sedimenting suspension of several thousandcircular particles in a direct numerical simulation. J Fluid Mech 434:23–37

7. Feng ZG, Michaelides EE (2005) Proteus: a direct forcing method in the simulations of particulate flows. J Comput Phys 202:20–518. Uhlmann M (2005) An immersed boundary method with direct forcing for the simulation of particulate flows. J Comput Phys

209:448–4769. Tsuji T, Yabumoto K, Tanaka T (2008) Spontaneous structures in three-dimensional bubbling gas-fluidized bed by parallel DEM–

CFD coupling simulation. Powder Technol 184(2):132–14010. Tsuji T, Ito A, Tanaka T (2008) Multi-scale structure of clustering particles. Powder Technol 179(3):115–12511. Jin S, Minev PD, Nandakumar K (2009) A scalable parallel algorithm for the direct numerical simulation of three-dimensional

incompressible particulate flow. Int J Comput Fluid Dyn 23(5):427–43712. Veeramani C, Minev PD, Nandakumar K (2007) A fictitious domain formulation for flows with rigid particles: a non-Lagrange

multiplier version. J Comput Phys 224:867–87913. Yu Z, Shao X, Wachs A (2006) A fictitious domain method for particulate flows with heat transfer. J Comput Phys 217(2):424–452

123123

PeliGRIFF 155

14. Yu Z, Wachs A (2007) A fictitious domain method for dynamic simulation of particle sedimentation in Bingham fluids.J Non-Newton Fluid Mech 145(2–3):78–91

15. Crowe C, Sommerfeld M, Tsuji Y (1998) Multiphase flows with droplets and particles. CRC Press, New York16. Cundall PA, Strack ODL (1979) A discrete numerical model for granular assemblies. Geotechnique 29:47–6517. Wu CY, Cocks ACF (2006) Numerical and experimental investigations of the flow of powder into a confined space. Mech Mater

38:304–32418. Komiwes V, Mege P, Meimon Y, Herrmann H (2006) Simulation of granular flow in a fluid applied to sedimentation. Granul

Matter 8:41–5419. Yu Z, Wachs A, Peysson Y (2006) Numerical simulation of particle sedimentation in shear-thinning fluids with a fictitious domain

method. J Non-Newton Fluid Mech 136:126–13920. Haider A, Levenspiel O (1989) Drag coefficient and terminal velocity of spherical and non-spherical particles. Powder Technol

58:63–7021. Yu Z, Phan-Thien N, Fan Y, Tanner RI (2002) Viscoelastic mobility problem of a system of particles. J Non-Newton Fluid Mech

104:87–12422. ten Cate A, Nieuwstad CH, Derksen JJ, Van den Akker HEA (2002) Particle image velocimetry and lattice-Boltzmann simulations

on a single sphere settling under gravity. Phys Fluids 14(11):4012–402523. Feng ZG, Michaelides EE (2009) Robust treatment of no-slip boundary condition and velocity updating for the lattice-Boltzmann

simulation of particulate flows. Comput Fluids 38(2):370–381

123

peligriff, a parallel dem-dlm/fd direct numerical simulation tool for 3d particulate flows

Documents