ensemble propagation for e cient uncertainty quanti cation ... · e cient ensemble gmres with...
TRANSCRIPT
Ensemble propagation for efficient uncertainty quantification:Application to the thermomechanical modeling of a first mirror
for the ITER core CXRS diagnostics
K. Liegeois1, R. Boman1, E. T. Phipps2, Ph. Mertens3, Y. Krasikov3 and M. Arnst1
1Aerospace and Mechanical Engineering, Universite de Liege, Belgium2Sandia National Laboratories, USA
3Forschungszentrum Julich, Germany
UNCECOMP 2019Hersonissos, Greece
June 25, 2019
Ongoing PhD: New methods for parametric computations with multiphysics modelson HPC architectures with applications to design of opto-mechanical systems
ITER High performance computing library
Clusters Emerging architectures
1 / 15
Parametric computations
Sampling-based parametric computations typically require numerous calls of potentiallycostly models.
Example: Monte Carlo for uncertainty quantification.
Parameters
Model
. . .
Model
Quantity of interest
Goal of this work: to reduce the CPU time to evaluate a given set of samples.
2 / 15
Ensemble propagation
In sampling-based parametric computation, instead of individually evaluating each instance ofthe model, Ensemble propagation (EP) consists of simultaneously evaluating a subset ofsamples of the model.
Model Model
EP was introduced by [Phipps, 2017], made available in Stokhos a package of Trilinos, andimplemented using a template-based generic-programming approach:
template <typename T, int ensemble_size>class Ensemble{
T data[ensemble_size];Ensemble<T,ensemble_size> operator+ (const Ensemble<T,ensemble_size> &v);Ensemble<T,ensemble_size> operator- (const Ensemble<T,ensemble_size> &v);Ensemble<T,ensemble_size> operator* (const Ensemble<T,ensemble_size> &v);Ensemble<T,ensemble_size> operator/ (const Ensemble<T,ensemble_size> &v);//...
}
3 / 15
Ensemble propagation
Advantages of the EP:
I Reuse of common variables,I More opportunities for SIMD (more data parallelism),I Improved memory usage,I Reduction of Message Passing Interface (MPI) latency per sample.
Challenges of the EP:
I Increased memory usage,I Ensemble divergence:
I control flow divergence: if-then-else divergence and loop divergence,
A B
If
End
Inactive Active
If
End
I function call divergence. 4 / 15
Parametric linear systems
We want to solve a parametric linear system for a subset of s samples of the parameterstogether:
A::` x :` = b:` for all ` = 1, . . . , s.
Representation of a system for s = 4:
=
A X B
How to solve efficiently the parametric linear system with EP?
5 / 15
Reduced inner product used in conjugate gradient method
First approach [Phipps, 2017]: to gather the sample matrix into a block diagonal matrixA::1
. . .
A::s
x :1
...x :s
=
b:1...
b:s
,
and to apply the conjugate gradient method on the block diagonal system.
Mathematically equivalent to defining a reduced inner product:
= + + +
Advantages: No ensemble divergence and possibility to use efficient BLAS implementations,Challenges: The samples are coupled together, the spectra are gathered, and the conditionnumber increases.
6 / 15
Ensemble-typed inner product used in conjugate gradient method
Second approach [D’Elia, 2017]: to avoid the coupling of the samples together using anensemble-typed inner product:
=
It was first introduced for grouping purpose.
Advantages: No coupling: each sample converges as fast as if it was propagated alone,Challenges: Every ensemble divergence has to be managed explicitly.
7 / 15
Occurrence of ensemble divergence in GMRES
r (0) = b − Ax (0)
β = ‖r (0)‖v : 1 = r (0)/βfor j = 1, . . . ,m do
w = AM−1 v : j
h(1:j)j = VT:(1:j) w
v :(j+1) = w − V :(1:j) h(1:j)j
h(j+1) j = ‖v : (j+1)‖if h(j+1) j 6= 0 then
v : (j+1) = v : (j+1)/h(j+1) j
elsem = jbreak
if qT:(j+1)
e1 ≤ ε thenm = jbreak
y = arg minz ‖β e1 −H(1:m+1)(1:m) y‖x (m) = x (0) + M−1V :(1:m) y
Ensemble divergence in GMRES:
1. an Arnoldi vector can require anormalization or not: if-then-elsedivergence,
2. different samples may require differentnumbers of iterations to converge: loopdivergence,
3. called BLAS functions, such as GEMVfor the dense matrix-vector operations,may not support ensemble-typed inputs,leading to function call divergence.
8 / 15
Efficient ensemble GMRES with ensemble-typed inner product
In [Liegeois, in preparation], we describe and implement an efficient ensemble GMRESwithout ensemble reduction.
The control flow divergence and the function call divergence have been solved by:
I Implementing a Mask class which is used to apply masked assignment and logicalreduction;
I Implementing an efficient ensemble GEMV for the orthogonalization process.
Those two contributions lead to:
I An equivalent cost per iteration of ensemble GMRES with and without reduction;
I A safe implementation which is able to deal with converged samples.
9 / 15
Implemented code and its capabilities
I Fully templated C++ code heavilybased on Trilinos which provides a fullytemplated solver stack;
I Embedded in a Python interface. Thiseases the looping around samples, thegrouping of samples together, etc;
I Hybrid parallelism based on Tpetrawith MPI for distributed memory andKokkos with OpenMP for sharedmemory;
I Uses Gmsh [Geuzaine, 2009] to import3D meshes and VTK to write the outputfiles;
I Has already generated preliminary resultsfor industrial thermomechanicalcontact problems. 10 / 15
Test case 1: mesh tying problem
I Plate with a hole pulledon two opposite sides;
I Two meshes glued withthe Mortar finite elementmethod in saddle pointformulation;
I Lame parametersrepresented as aGaussian random field;
I Multigrid preconditionerwith saddle point matrixon each multigrid level;
I Solved on Intel(R)Xeon(R) Platinum 8160CPU.
t
t
Realization 1:
Realization 2:
11 / 15
Test case 1: speed-up of one GMRES iteration
1 8 16 24 321
1.5
2
2.5
Ensemble size
Sp
eed
-Up
:S
Sparse matrix vector product
Preconditioner
Orthogonalization with reduction
Orthogonalization without reduction
12 / 15
Test case 1: total speed-up
0 20 40 60100
120
140
160
180
200
Sample
Nu
mb
erof
iter
atio
ns
Ensemble size 1 8 16 32With reduction
Without reduction
1 8 16 24 321
1.5
2
2.5
Ensemble sizeS
pee
d-U
p:
S
With reduction
Without reduction
13 / 15
Test case 2: first results on the ITER mirror
MPI rank:
Temperature:
1 8 16 24 321
1.5
2
2.5
Ensemble size
Sp
eed
-Up
:S
With reduction
Without reduction
14 / 15
Conclusion
Conclusion and contributions:
I Two variants of GMRES can currently be used: with reduced inner product and withensemble-typed inner product;
I Cost per iteration of ensemble GMRES is independent of coupling the samples togetherwith ensemble reduction;
I Ensemble GMRES without reduction is faster due to an improved convergence comparedto ensemble GMRES with reduction.
Future work:
I Finalize the application of the method on engineering problems relevant for ITER incollaboration with FZ. Julich;
I Finalize the testing on more than one computational node to leverage the increasedmemory usage.
15 / 15
References
I M. D’Elia, E. T. Phipps, A. Rushdiz, and M. S. Ebeida, Surrogate-based EnsembleGrouping Strategies for Embedded Sampling-based Uncertainty Quantification. arXivpreprint arXiv:1705.02003, 2017.
I K. Liegeois, R. Boman, E.T. Phipps, T. Wiesner, and M. Arnst, Efficient GMRES withEnsemble Propagation: Application to non-Symmetric-Positive-Definite MechanicalProblems, In preparation.
I E.T. Phipps, M. D’Elia, H C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam,Embedded ensemble propagation for improving performance, portability, and scalabilityof uncertainty quantification on emerging computational architectures. SIAM Journal onScientific Computing, 2017, vol. 39, no 2, p. C162-C193.
Acknowledgement
The first author, Kim Liegeois, would like to acknowledge the BelgianNational Fund for Scientific Research (FNRS-FRIA) and the Federation
Wallonia-Brussels (FW-B) for their financial support.