domain decomposition methods for the finite element...
TRANSCRIPT
Introduction Domain decomposition Numerical Experiments Conclusions
Domain decomposition methods for theFinite Element Approximation of
partial differential equationsECAR Workshop 2012
Santiago Badia1,2, Alberto F. Martín2 and Javier Principe1,2
1 Universitat Politècnica de Catalunya2 International Center for Numerical Methods in Engineering (CIMNE)
Buenos Aires, July 2012
Introduction Domain decomposition Numerical Experiments Conclusions
Outline
Introduction
Domain decomposition methods
Numerical experimentsCoarse problem solution strategiesWeak scalability for 3d problems
Conclusions
Introduction Domain decomposition Numerical Experiments Conclusions
Outline
Introduction
Domain decomposition methods
Numerical experimentsCoarse problem solution strategiesWeak scalability for 3d problems
Conclusions
Introduction Domain decomposition Numerical Experiments Conclusions
Introduction
Finite element method (FEM)• It is a method for the numerical solution of partial differential equations.• Widely used in engineering analysis (users).• 1K-3K people conference each year (researchers).• Permits to deal with arbitrary geometries.• Sound mathematical theory.• “Easy” implementation (local structure).
Domain decomposition method (DDM)• It is a divide and conquer method to solve concurrently.• A small community, 200-300 people conference each year (www.ddm.org)• Mainly mathematical oriented, some HPC implementations but not many:
• TRILINOS (AztecOO,ML)• PETSC (BNN?,BDDC?)
Introduction Domain decomposition Numerical Experiments Conclusions
Computational Methods in Fusion Technology (COMFUS)EU Starting Grant awarded to S. Badia (5-year project, since 2011)
ITER (Experimental Fusion Reactor)
Magnetically confined plasma
• Very high temperature (150,000,000 oC)
• Fusion reaction ⇒ Heat + n
Test Blanket Module (TBM)
• Absorb heat and extract it
• Absorb n + Li → Tr (self-sustainment)
Introduction Domain decomposition Numerical Experiments Conclusions
Blanket Modules
Introduction Domain decomposition Numerical Experiments Conclusions
Blanket Modules
Introduction Domain decomposition Numerical Experiments Conclusions
Blanket Modules
Introduction Domain decomposition Numerical Experiments Conclusions
Blanket Modules
Introduction Domain decomposition Numerical Experiments Conclusions
Overall strategy
• Stabilized FEM solvers• Based on multiscale concepts (model the subscales, the part of the unknown not
captured by the grid).• Permits to treat different problems with the same discretization (multiphysics).• Optimal a priori error estimates.
• Fully implicit schemes• Unconditionally stability regardless of time step sizes (multiple time scales).• Requires the solution of a (very large) linear system per time step.
• Block iterative preconditioners• Built from positive definite operators (Laplace, CDR).• Algorithmically scalable (independent of the discretization).
• Domain decomposition methods for positive definite problems• Hybrid (direct-iterative) robust methods• Permit to obtain weakly scalable algorithms
Introduction Domain decomposition Numerical Experiments Conclusions
Problem statementGiven a bounded domain Ω and a FE partition T , we build aconforming (nodal) finite element (FE) space W ⊂ H1
0(Ω).
• Strong problem: find u ∈ WLu = f
where (as a model problem) L = −∇2
• Variational problem: find u ∈ W such that
a(u, v) = (f , v ), for any v ∈ W,
where f ∈ V ′ and (as a model problem) a(u, v) =∫
Ω∇ u · ∇ v dΩ.
• FEM approximation: Expand the unknown u and test function v in terms ofbasis functions Na(x)
u (x) =
n∑j=1
N j(x)Uj, v (x) =
n∑i=1
N i(x)V i
Introduction Domain decomposition Numerical Experiments Conclusions
Problem statementGiven a bounded domain Ω and a FE partition T , we build aconforming (nodal) finite element (FE) space W ⊂ H1
0(Ω).
• Strong problem: find u ∈ WLu = f
where (as a model problem) L = −∇2
• Variational problem: find u ∈ W such that
a(u, v) = (f , v ), for any v ∈ W,
where f ∈ V ′ and (as a model problem) a(u, v) =∫
Ω∇ u · ∇ v dΩ.
• FEM approximation: Expand the unknown u and test function v in terms ofbasis functions Na(x)
u (x) =
n∑j=1
N j(x)Uj, v (x) =
n∑i=1
N i(x)V i
Introduction Domain decomposition Numerical Experiments Conclusions
Problem statementGiven a bounded domain Ω and a FE partition T , we build aconforming (nodal) finite element (FE) space W ⊂ H1
0(Ω).
• Strong problem: find u ∈ WLu = f
where (as a model problem) L = −∇2
• Variational problem: find u ∈ W such that
a(u, v) = (f , v ), for any v ∈ W,
where f ∈ V ′ and (as a model problem) a(u, v) =∫
Ω∇ u · ∇ v dΩ.
• Algebraic problem: Find u ∈ RN such that
Au = b
where Ai,j = a(N i,N j) is a symmetric and positive matrix whose sparsitydepends on the mesh and bi = (N i, f ).
Introduction Domain decomposition Numerical Experiments Conclusions
Contents of the talk
• Describe preconditioners of balancing type which permit to obtain weaklyscalable algorithms
• Balancing Neumann-Neumann (BNN)• Balancing DD by Constraints (BDDC)
• Mention our rehabilitation of the BNN method.• Easy/cheap treatment of singular Neumann problems,• Spare one Dirichlet solver per iteration (like additive).
• Describe our hybrid implementation of domain decomposition methods (butpure MPI results).
• Present weak scalability results up to 4K cores for structured meshes.• Present a comparison of two strategies for the solution of the coarse problem.
Introduction Domain decomposition Numerical Experiments Conclusions
Outline
Introduction
Domain decomposition methods
Numerical experimentsCoarse problem solution strategiesWeak scalability for 3d problems
Conclusions
Introduction Domain decomposition Numerical Experiments Conclusions
Domain partition
The global problem on (Th,Ω)
h
Introduction Domain decomposition Numerical Experiments Conclusions
Domain partition
is partitioned into P local problems on (T ih ,Ωi)
H
h
Introduction Domain decomposition Numerical Experiments Conclusions
Domain partition
Local (internal) interfaces Γi = ∂Ωi \ ∂Ω
Introduction Domain decomposition Numerical Experiments Conclusions
Domain partition
generate a global (internal) interface Γ =⋃nsbd
i=1 Γi.
Introduction Domain decomposition Numerical Experiments Conclusions
Domain partition
Now, we define local FE spaces Vi related to Γi and a global FE space V related to Γ.
Introduction Domain decomposition Numerical Experiments Conclusions
Blanket Modules
• Represent the mesh by a graph• Use a graph partition tool (e.g. METIS)• Generate interface matching information• Done as a preprocessing step
Introduction Domain decomposition Numerical Experiments Conclusions
Interface (Schur complement) problem• The partition induces a structure
Au =
[AII AIΓ
AΓI AΓΓ
] [uI
uΓ
]=
[bI
bΓ
]= b
whereAII = diag
(A(1)
II ,A(2)II , ...A
(P)II
)• After static condensation of bubble functions uI
SuΓ = g
where the Schur complement S ∈ Rni×ni and g ∈ Rni read
S = AΓΓ − AΓIA−1II AIΓ, g = fΓ − A−1
II bI
• Local solvers based on external cutting-edge multi-threaded sparse directlibraries (e.g., PARDISO).
Introduction Domain decomposition Numerical Experiments Conclusions
Solution methodsThe conjugate gradient method requires O (
√κ) iterations and can be applied to
• the whole problem (Au = b)
κ (A) ≤ Ch−2 = CN2/d
• interface problem (Sx = g)
κ (S) ≤ CH−1h−1 = CP1/dN1/d
• the interface problem problem (Sy = g) preconditioned using inverses of localmatrices Si (Neumann-Neumann)
κ(B−1
NNS)≤ CH−2
[1 + log2
(Hh
)]= CP2/d
[1 + d−1log2
(NP
)]• the interface problem problem (Sy = g) preconditioned by balancing methods
κ(B−1
BDDS)≤ C
[1 + log2
(Hh
)]= C
[1 + d−1log2
(NP
)]
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing Neumann-Neumann (BNN)
• Introduce a global (coarse) approximation (balancing) B−1C and define the
multiplicative preconditioner
B−1BNN = B−1
C + (I − B−1C S)B−1
NN
• The coarse space is H0 = spanφi : i = 1, . . . , nsbd where φi = Ii1Γi .
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing Neumann-Neumann (BNN)
• Introduce a global (coarse) approximation (balancing) B−1C and define the
multiplicative preconditioner
B−1BNN = B−1
C + (I − B−1C S)B−1
NN
• The coarse space is H0 = spanφi : i = 1, . . . , nsbd where φi = Ii1Γi .
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing Neumann-Neumann (BNN)
• Introduce a global (coarse) approximation (balancing) B−1C and define the
multiplicative preconditioner
B−1BNN = B−1
C + (I − B−1C S)B−1
NN
• The coarse space is H0 = spanφi : i = 1, . . . , nsbd where φi = Ii1Γi .
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing Neumann-Neumann (BNN)
• Introduce a global (coarse) approximation (balancing) B−1C and define the
multiplicative preconditioner
B−1BNN = B−1
C + (I − B−1C S)B−1
NN
• The coarse space is H0 = spanφi : i = 1, . . . , nsbd where φi = Ii1Γi .
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing Neumann-Neumann (BNN)
• Introduce a global (coarse) approximation (balancing) B−1C and define the
multiplicative preconditioner
B−1BNN = B−1
C + (I − B−1C S)B−1
NN
• The coarse space is H0 = spanφi : i = 1, . . . , nsbd where φi = Ii1Γi .
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing Neumann-Neumann (BNN)
• Introduce a global (coarse) approximation (balancing) B−1C and define the
multiplicative preconditioner
B−1BNN = B−1
C + (I − B−1C S)B−1
NN
• The coarse space is H0 = spanφi : i = 1, . . . , nsbd where φi = Ii1Γi .
Introduction Domain decomposition Numerical Experiments Conclusions
BNN rehabilitation
Currently• Drawback 1: dealing with singular matrices Si (pseudo-inverses)
• Drawback 2: requires 2 Dirichlet solvers per iteration (with respect to one in theadditive BDDC)
• Deprecated; overperformed by BDDC...
but we propose a rehabilitation• Definite matrices are obtained fixing appropriately chosen degrees of freedom.• Reusing preconditioner computations in the Schur complement multiplication
we can spare one Dirichlet solver.• Implies also a reduction of nearest neighbor communications (+ scalable)
and there are advantages• Smaller coarse problem in 3d (not in 2d)• Easier implementation
Introduction Domain decomposition Numerical Experiments Conclusions
BNN rehabilitation
Currently• Drawback 1: dealing with singular matrices Si (pseudo-inverses)
• Drawback 2: requires 2 Dirichlet solvers per iteration (with respect to one in theadditive BDDC)
• Deprecated; overperformed by BDDC...
but we propose a rehabilitation• Definite matrices are obtained fixing appropriately chosen degrees of freedom.• Reusing preconditioner computations in the Schur complement multiplication
we can spare one Dirichlet solver.• Implies also a reduction of nearest neighbor communications (+ scalable)
and there are advantages• Smaller coarse problem in 3d (not in 2d)• Easier implementation
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing DD by Constraints (BDDC)
• A discontinuous coarse space is proposed (not Galerkin).• Basis functions are defined locally as the solution of a problem which ensures
continuity of values on corners and mean values on edges (faces)• These constrains led to positive definite local problems (IF properly applied) so
external libraries can be used.• It is defined as an additive correction (1DS + 1NS + 1CS)
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing DD by Constraints (BDDC)
• A discontinuous coarse space is proposed (not Galerkin).• Basis functions are defined locally as the solution of a problem which ensures
continuity of values on corners and mean values on edges (faces)• These constrains led to positive definite local problems (IF properly applied) so
external libraries can be used.• It is defined as an additive correction (1DS + 1NS + 1CS)
Introduction Domain decomposition Numerical Experiments Conclusions
Balancing DD by Constraints (BDDC)
• A discontinuous coarse space is proposed (not Galerkin).• Basis functions are defined locally as the solution of a problem which ensures
continuity of values on corners and mean values on edges (faces)• These constrains led to positive definite local problems (IF properly applied) so
external libraries can be used.• It is defined as an additive correction (1DS + 1NS + 1CS)
Introduction Domain decomposition Numerical Experiments Conclusions
PCG algorithm
BNN_PCG (Input: (S,BBNN, g, x0), Output: x)z0 := B−1
BNN(I − SB−1C )r0
p0 := z0for j = 0, 1, . . . , till convergence do
αj := (rj, zj)/(Spj, pj) (GR + LDS*)xj+1 := xj + αjpj
rj+1 := rj − αjSpj
zj+1 := B−1NNrj+1 (LC+LNS+LC)
s := rj+1 − Szj+1 (LDS)zj+1 := zj+1 + B−1
C s (GC+GCS)βj := (rj+1, zj+1)/(rj, zj) (GR)pj+1 := zj+1 + βjpj
end for
• Local operations (LC: communication, LNS: Neumann solver, LDS: Dirichlet solver)
• Global operations (GR: reduction, GC: communication, GCS: coarse solver)
Introduction Domain decomposition Numerical Experiments Conclusions
Coarse problem solution strategies
• Serial gather (SG) MPI Rank 0 is responsible for the the serial computation ofthe coarse- grid correction
• All gather (AG) All MPI Ranks are responsible for the serial computation of thecoarse-grid correction.
N Solver
N Solver
N Solver
N Solver
N Solver
Nearest Neighbor Comm.
Updates
Updates
Updates
Updates
Updates
Nearest Neighbor Comm.
D Solver
D Solver
D Solver
D Solver
D Solver
GlobalReduction
GlobalGather
GlobalScatter
C Solver
Introduction Domain decomposition Numerical Experiments Conclusions
Coarse problem solution strategies
• Serial gather (SG) MPI Rank 0 is responsible for the the serial computation ofthe coarse- grid correction
• All gather (AG) All MPI Ranks are responsible for the serial computation of thecoarse-grid correction.
N Solver
N Solver
N Solver
N Solver
N Solver
Nearest Neighbor Comm.
Updates
Updates
Updates
Updates
Updates
Nearest Neighbor Comm.
D Solver
D Solver
D Solver
D Solver
D Solver
GlobalReduction
GlobalAllGather
C Solver
C Solver
C Solver
C Solver
C Solver
Introduction Domain decomposition Numerical Experiments Conclusions
Outline
Introduction
Domain decomposition methods
Numerical experimentsCoarse problem solution strategiesWeak scalability for 3d problems
Conclusions
Introduction Domain decomposition Numerical Experiments Conclusions
Experimental framework (Software + Platform)
FEMPAR Finite Element Multiphyiscs PARallel software (in-house):
• MPI implementation of sub-structuring DDMs• Relies on highly-efficient vendor implementations of the BLAS (Intel MKL,
IBM ESSL, etc.)• Provides interfaces to external multi-threaded sparse direct solvers (PARDISO,
WSMP, etc.)• Although codes are hybrid MPI/OpenMP, focus is on pure MPI model with
one-to-one mapping among subdomains/MPI tasks/physical cores.
running on• Marenostrum@BSC (2560 JS21 blades, 10240 cores)
Introduction Domain decomposition Numerical Experiments Conclusions
Coarse problem solution strategies
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1664 144 256 400 576 784 1024
TO
TA
L W
all
clo
ck tim
e (
secs.)
#cores
BNN SGBNN AG
Introduction Domain decomposition Numerical Experiments Conclusions
Serial gather (SG) strategyUsing PARAVER tool developed at BSC
P = 16
Introduction Domain decomposition Numerical Experiments Conclusions
Serial gather (SG) strategyUsing PARAVER tool developed at BSC
P = 64
Introduction Domain decomposition Numerical Experiments Conclusions
Serial gather (SG) strategyUsing PARAVER tool developed at BSC
P = 144
Introduction Domain decomposition Numerical Experiments Conclusions
Serial gather (SG) strategyUsing PARAVER tool developed at BSC
P = 256
Introduction Domain decomposition Numerical Experiments Conclusions
Serial gather (SG) strategyUsing PARAVER tool developed at BSC
P = 400
Introduction Domain decomposition Numerical Experiments Conclusions
Serial gather (SG) strategyUsing PARAVER tool developed at BSC
P = 576
Introduction Domain decomposition Numerical Experiments Conclusions
Serial gather (SG) strategyUsing PARAVER tool developed at BSC
P = 784
Introduction Domain decomposition Numerical Experiments Conclusions
Serial gather (SG) strategyUsing PARAVER tool developed at BSC
P = 1024
Introduction Domain decomposition Numerical Experiments Conclusions
Global collective communicationsUsing IBM XL Compiler and MPICH 1.2.7 with variable and fixed size collectives
Scatter wall clock time [µ s]
0
500
1000
1500
2000
2500
3000
3500
4000
16 64 144 256 400 576 784 1024
cores
Scatterv wall clock time [µ s]
0
5000
10000
15000
20000
25000
30000
35000
40000
16 64 144 256 400 576 784 1024
coresMS=16MS=64
MS=128MS=256MS=512
Introduction Domain decomposition Numerical Experiments Conclusions
Coarse problem solution strategies
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
16 256 576 1024 1296 1600 1936 2304 2704 3136 3600 4096
TO
TA
L W
all
clo
ck tim
e (
secs.)
#cores
BNN SGBNN AG
Introduction Domain decomposition Numerical Experiments Conclusions
Weak scalability for 3D problems
• Target problem: −∆u = f on a rectangular prism Ω = [0, 2]× [0, 2]× [0, 1]
• Uniform global mesh of hexahedral Q1 finite elements• Uniform domain partition into rectangular prism grids of 2m× 2m× m
hexahedral local meshes• We use 4 cores/blade and m = 2, 3, 4 . . . , 10• Gradually larger local problem sizes H
h = 10, 20, 30, 40• Weak scalability: at which rate a given magnitude evolves while increasing the
number of cores while keeping Hh constant ?
• Focus on the total computation time and number of PCG iterations for theinterface problem
Introduction Domain decomposition Numerical Experiments Conclusions
BNN vs. BDDC (PCG iterations)
0
2
4
6
8
10
32 256 500 864 1372 2048 2916 4000
Nu
mb
er
of
PC
G ite
ratio
ns
#cores
BNN
BDDC.CE
BDDC.CEF
H/h = 10
0
2
4
6
8
10
12
32 256 500 864 1372 2048 2916 4000
Nu
mb
er
of
PC
G ite
ratio
ns
#cores
BNN
BDDC.CE
BDDC.CEF
H/h = 20
• BNN ∼ BDDC(ce)• BDDC(cef) small reduction from BDDC(ce)
Introduction Domain decomposition Numerical Experiments Conclusions
BNN vs. BDDC (PCG iterations)
0
2
4
6
8
10
12
14
32 256 500 864 1372 2048 2916 4000
Nu
mb
er
of
PC
G ite
ratio
ns
#cores
BNN
BDDC.CE
BDDC.CEF
H/h = 30
0
2
4
6
8
10
12
14
32 256 500 864 1372 2048 2916 4000
Nu
mb
er
of
PC
G ite
ratio
ns
#cores
BNN
BDDC.CE
BDDC.CEF
H/h = 40
• BNN ∼ BDDC(ce)• BDDC(cef) small reduction from BDDC(ce)
Introduction Domain decomposition Numerical Experiments Conclusions
BNN vs. BDDC (total computation time)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
32 256 500 864 1372 2048 2916 4000
TO
TA
L W
all
clo
ck t
ime
(se
cs.)
#cores
BNNBDDC.CE
BDDC.CEF
H/h = 10
0
1
2
3
4
5
6
7
32 256 500 864 1372 2048 2916 4000
TO
TA
L W
all
clo
ck t
ime
(se
cs.)
#cores
BNNBDDC.CE
BDDC.CEF
H/h = 20
• Enhanced BNN outperforms BDDC(ce) as p ↑ or Hh ↓ (dominant coarse solver)
• Almost identical as Hh ↑ (BUT enhancement basic)
• BDDC(cef) not competitive
Introduction Domain decomposition Numerical Experiments Conclusions
BNN vs. BDDC (total computation time)
0
2
4
6
8
10
12
14
16
18
32 256 500 864 1372 2048 2916 4000
TO
TA
L W
all
clo
ck t
ime
(se
cs.)
#cores
BNNBDDC.CE
BDDC.CEF
H/h = 30
0
5
10
15
20
25
30
35
40
45
50
32 256 500 864 1372 2048 2916 4000
TO
TA
L W
all
clo
ck t
ime
(se
cs.)
#cores
BNNBDDC.CE
BDDC.CEF
H/h = 40
• Enhanced BNN outperforms BDDC(ce) as p ↑ or Hh ↓ (dominant coarse solver)
• Almost identical as Hh ↑ (BUT enhancement basic)
• BDDC(cef) not competitive
Introduction Domain decomposition Numerical Experiments Conclusions
Outline
Introduction
Domain decomposition methods
Numerical experimentsCoarse problem solution strategiesWeak scalability for 3d problems
Conclusions
Introduction Domain decomposition Numerical Experiments Conclusions
Conclusions
• Enhanced BNN:• Sparse direct solvers for definite matrices (PARDISO).• Spare of a Dirichlet solver per iteration (= Additive BDDC).
• Hybrid implementation of BDD methods• Weakly scalable in terms of iteration count. (in accordance with the 1 + log2(H
h )estimate of the condition number).
• Weakly scalable in terms cpu time when p ↓ or Hh ↑ (dominant fine solver).
• The coarse problem is solved faster using the serial gather strategy.
• BDD comparison:• 2d: BNN and BDDC-(c) similar, BNN and BDDC-(ce) depends on H
h (BNN doesnot outperform BDDC(*))
• 3d: BNN very competitive in 3d (superior as p ↑ or Hh ↓)
Introduction Domain decomposition Numerical Experiments Conclusions
Current and future work
• Porting our code to other platforms (CURIE, JUROPA/HPC-FF).• Extension to the (monolithic) elasticity problem.• Comprehensive hybrid tests.• Comprehensive unstructured tests.• Other strategies for the treatment of the coarse problem (multilevel, additive).
S. Badia, A. F. Martín and J. PrincipeEnhanced balancing Neumann-Neumann preconditioning in computational fluidand solid mechanics, in preparation.
S. Badia, A. F. Martín and J. PrincipeImplementation and weak scalability study of domain decomposition methods ofbalancing type, in preparation.
Introduction Domain decomposition Numerical Experiments Conclusions
Acknowledgements:• European Research Council (ERC) (funding)• Spanish supercomputing network (RES) (computer resources, technical
expertise and assistance)
Thank you!
javier principehttp://principe.rmee.upc.edu/http://www.cimne.com/comfus/