ensemble propagation for e cient uncertainty quanti cation ... · e cient ensemble gmres with...

Ensemble propagation for efficient uncertainty quantification:Application to the thermomechanical modeling of a first mirror

for the ITER core CXRS diagnostics

K. Liegeois1, R. Boman1, E. T. Phipps2, Ph. Mertens3, Y. Krasikov3 and M. Arnst1

1Aerospace and Mechanical Engineering, Universite de Liege, Belgium2Sandia National Laboratories, USA

3Forschungszentrum Julich, Germany

UNCECOMP 2019Hersonissos, Greece

June 25, 2019

Ongoing PhD: New methods for parametric computations with multiphysics modelson HPC architectures with applications to design of opto-mechanical systems

ITER High performance computing library

Clusters Emerging architectures

1 / 15

Parametric computations

Sampling-based parametric computations typically require numerous calls of potentiallycostly models.

Example: Monte Carlo for uncertainty quantification.

Parameters

Model

. . .

Model

Quantity of interest

Goal of this work: to reduce the CPU time to evaluate a given set of samples.

2 / 15

Ensemble propagation

In sampling-based parametric computation, instead of individually evaluating each instance ofthe model, Ensemble propagation (EP) consists of simultaneously evaluating a subset ofsamples of the model.

Model Model

EP was introduced by [Phipps, 2017], made available in Stokhos a package of Trilinos, andimplemented using a template-based generic-programming approach:

template <typename T, int ensemble_size>class Ensemble{

T data[ensemble_size];Ensemble<T,ensemble_size> operator+ (const Ensemble<T,ensemble_size> &v);Ensemble<T,ensemble_size> operator- (const Ensemble<T,ensemble_size> &v);Ensemble<T,ensemble_size> operator* (const Ensemble<T,ensemble_size> &v);Ensemble<T,ensemble_size> operator/ (const Ensemble<T,ensemble_size> &v);//...

}

3 / 15

Ensemble propagation

Advantages of the EP:

I Reuse of common variables,I More opportunities for SIMD (more data parallelism),I Improved memory usage,I Reduction of Message Passing Interface (MPI) latency per sample.

Challenges of the EP:

I Increased memory usage,I Ensemble divergence:

I control flow divergence: if-then-else divergence and loop divergence,

A B

If

End

Inactive Active

If

End

I function call divergence. 4 / 15

Parametric linear systems

We want to solve a parametric linear system for a subset of s samples of the parameterstogether:

A::` x :` = b:` for all ` = 1, . . . , s.

Representation of a system for s = 4:

=

A X B

How to solve efficiently the parametric linear system with EP?

5 / 15

Reduced inner product used in conjugate gradient method

First approach [Phipps, 2017]: to gather the sample matrix into a block diagonal matrixA::1

. . .

A::s

x :1

...x :s

=

b:1...

b:s

,

and to apply the conjugate gradient method on the block diagonal system.

Mathematically equivalent to defining a reduced inner product:

= + + +

Advantages: No ensemble divergence and possibility to use efficient BLAS implementations,Challenges: The samples are coupled together, the spectra are gathered, and the conditionnumber increases.

6 / 15

Ensemble-typed inner product used in conjugate gradient method

Second approach [D’Elia, 2017]: to avoid the coupling of the samples together using anensemble-typed inner product:

=

It was first introduced for grouping purpose.

Advantages: No coupling: each sample converges as fast as if it was propagated alone,Challenges: Every ensemble divergence has to be managed explicitly.

7 / 15

Occurrence of ensemble divergence in GMRES

r (0) = b − Ax (0)

β = ‖r (0)‖v : 1 = r (0)/βfor j = 1, . . . ,m do

w = AM−1 v : j

h(1:j)j = VT:(1:j) w

v :(j+1) = w − V :(1:j) h(1:j)j

h(j+1) j = ‖v : (j+1)‖if h(j+1) j 6= 0 then

v : (j+1) = v : (j+1)/h(j+1) j

elsem = jbreak

if qT:(j+1)

e1 ≤ ε thenm = jbreak

y = arg minz ‖β e1 −H(1:m+1)(1:m) y‖x (m) = x (0) + M−1V :(1:m) y

Ensemble divergence in GMRES:

1. an Arnoldi vector can require anormalization or not: if-then-elsedivergence,

2. different samples may require differentnumbers of iterations to converge: loopdivergence,

3. called BLAS functions, such as GEMVfor the dense matrix-vector operations,may not support ensemble-typed inputs,leading to function call divergence.

8 / 15

Efficient ensemble GMRES with ensemble-typed inner product

In [Liegeois, in preparation], we describe and implement an efficient ensemble GMRESwithout ensemble reduction.

The control flow divergence and the function call divergence have been solved by:

I Implementing a Mask class which is used to apply masked assignment and logicalreduction;

I Implementing an efficient ensemble GEMV for the orthogonalization process.

Those two contributions lead to:

I An equivalent cost per iteration of ensemble GMRES with and without reduction;

I A safe implementation which is able to deal with converged samples.

9 / 15

Implemented code and its capabilities

I Fully templated C++ code heavilybased on Trilinos which provides a fullytemplated solver stack;

I Embedded in a Python interface. Thiseases the looping around samples, thegrouping of samples together, etc;

I Hybrid parallelism based on Tpetrawith MPI for distributed memory andKokkos with OpenMP for sharedmemory;

I Uses Gmsh [Geuzaine, 2009] to import3D meshes and VTK to write the outputfiles;

I Has already generated preliminary resultsfor industrial thermomechanicalcontact problems. 10 / 15

Test case 1: mesh tying problem

I Plate with a hole pulledon two opposite sides;

I Two meshes glued withthe Mortar finite elementmethod in saddle pointformulation;

I Lame parametersrepresented as aGaussian random field;

I Multigrid preconditionerwith saddle point matrixon each multigrid level;

I Solved on Intel(R)Xeon(R) Platinum 8160CPU.

t

t

Realization 1:

Realization 2:

11 / 15

Test case 1: speed-up of one GMRES iteration

1 8 16 24 321

1.5

2

2.5

Ensemble size

Sp

eed

-Up

:S

Sparse matrix vector product

Preconditioner

Orthogonalization with reduction

Orthogonalization without reduction

12 / 15

Test case 1: total speed-up

0 20 40 60100

120

140

160

180

200

Sample

Nu

mb

erof

iter

atio

ns

Ensemble size 1 8 16 32With reduction

Without reduction

1 8 16 24 321

1.5

2

2.5

Ensemble sizeS

pee

d-U

p:

S

With reduction

Without reduction

13 / 15

Test case 2: first results on the ITER mirror

MPI rank:

Temperature:

1 8 16 24 321

1.5

2

2.5

Ensemble size

Sp

eed

-Up

:S

With reduction

Without reduction

14 / 15

Conclusion

Conclusion and contributions:

I Two variants of GMRES can currently be used: with reduced inner product and withensemble-typed inner product;

I Cost per iteration of ensemble GMRES is independent of coupling the samples togetherwith ensemble reduction;

I Ensemble GMRES without reduction is faster due to an improved convergence comparedto ensemble GMRES with reduction.

Future work:

I Finalize the application of the method on engineering problems relevant for ITER incollaboration with FZ. Julich;

I Finalize the testing on more than one computational node to leverage the increasedmemory usage.

15 / 15

References

I M. D’Elia, E. T. Phipps, A. Rushdiz, and M. S. Ebeida, Surrogate-based EnsembleGrouping Strategies for Embedded Sampling-based Uncertainty Quantification. arXivpreprint arXiv:1705.02003, 2017.

I K. Liegeois, R. Boman, E.T. Phipps, T. Wiesner, and M. Arnst, Efficient GMRES withEnsemble Propagation: Application to non-Symmetric-Positive-Definite MechanicalProblems, In preparation.

I E.T. Phipps, M. D’Elia, H C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam,Embedded ensemble propagation for improving performance, portability, and scalabilityof uncertainty quantification on emerging computational architectures. SIAM Journal onScientific Computing, 2017, vol. 39, no 2, p. C162-C193.

Acknowledgement

The first author, Kim Liegeois, would like to acknowledge the BelgianNational Fund for Scientific Research (FNRS-FRIA) and the Federation

Wallonia-Brussels (FW-B) for their financial support.

ensemble propagation for e cient uncertainty quanti cation ... · e cient ensemble gmres with...

Documents