# supercomputing in computational science and engineering

Post on 28-Jan-2018

242 views

Category:

## Technology

1 download

Embed Size (px)

TRANSCRIPT

1. 1. Supercomputing in Computational Science and Engineering Santiago Badia Universitat Politcnica de Catalunya & CIMNE May 26th, 2016 0 / 32
2. 2. Outline 1 Computational Science and Engineering 2 Supercomputing in CSE 3 Multilevel solvers (LSSC-CIMNE) 4 Space-time solvers (LSSC-CIMNE) 5 Conclusions and future work 0 / 32
3. 3. Outline 1 Computational Science and Engineering 2 Supercomputing in CSE 3 Multilevel solvers (LSSC-CIMNE) 4 Space-time solvers (LSSC-CIMNE) 5 Conclusions and future work 0 / 32
4. 4. Computational science/engineering Defon: Development of computational models and algorithms to solve (complex) physical problems in science and engineering Mathematical modelling (My focus: PDE-based CSE) Numerical mathematics (discretization, solvers...) High performance computing Applications in virtually every discipline ( Climate modelling, subsurface modelling, geophysics, aeronautics, ...) 1 / 32
5. 5. Computational science/engineering Let us consider the simplest problem, i.e., Poisson problem u = f , on , + boundary conditions u belongs to -dim space H1 0 () (functional analysis) Replace it by a nite-dimensional space {v1 , . . . , vn } (one per node in a mesh) e.g., using nite element methods The PDE is now approximated by a linear system AU = F, where U = [u1 , . . . , un ] is the vector of nodal values A and F are obtained by numerical integration (in every FE) 2 / 32
6. 6. Simulation pipeline Step 1: Mesh generation (2D/3D) Step 2: Integration (e.g., using FEs) in space and time Step 3: Solve (non)linear system(s) Aij = i vi vj d AU = F (sparse system) 3 / 32
7. 7. Outline 1 Computational Science and Engineering 2 Supercomputing in CSE 3 Multilevel solvers (LSSC-CIMNE) 4 Space-time solvers (LSSC-CIMNE) 5 Conclusions and future work 3 / 32
8. 8. Current trends of supercomputing Transition from todays 10 Petaop/s supercomputers (SCs) ... to exascale systems w/ 1 Exaop/s expected in 2020 100 performance based on concurrency (not higher freq) Future: Multi-Million-core (in broad sense) SCs 4 / 32
9. 9. Current trends of supercomputing Transition from todays 10 Petaop/s supercomputers (SCs) ... to exascale systems w/ 1 Exaop/s expected in 2020 100 performance based on concurrency (not higher freq) Future: Multi-Million-core (in broad sense) SCs 4 / 32
10. 10. Impact in scientic computing (I) To exploit multi-million concurrency in CSE is a formidable task But poses a great opportunity, to simulate problems out of reach now Many initiatives about exascale scientic software Climate modelling (Source: ACME project, DOE) Combustion modelling (Source: EXACT project, DOE) 5 / 32
11. 11. Mesh generation The main bottleneck in CSE (not so many eorts as in solvers) Hard to parallelize (Delaunay triangulation) 6 / 32
12. 12. Adaptive mesh renement Couples 1-2-3... automatic mesh adaptation with error estimation Parallelization can be exploited for mesh generation and load balancing Octree-based software (highly parallel implementations, 100,000 procs p4est library) Problem: How to deal with complex geometries? 7 / 32
13. 13. Numerical integration Numerical integration is embarrasingly parallel But it requires a partition of the mesh into procs Mesh partitioners based on graph partitioners (not highly scalable) When AMR, space-lling curves can be used (extremely scalable) 8 / 32
14. 14. (Non)linear solvers Step 1: Linearization Newton method, line search (Jacobian needed) Anderson acceleration (Jacobian not needed) Step 2: Linear system solve Sparse direct methods (CPU and memory demanding, scalability beyond 10,000 procs not possible) Iterative methods (less CPU and memory demanding, able to scale on largest supercomputers) First linearize next solve is changing now (nonlinear preconditioning and solvers) 9 / 32
15. 15. Weakly scalable solvers This talk: One challenge, weakly scalable algorithms Weak scalability If we increase X times the number of processors, we can solve an X times larger problem Key property to face more complex problems / increase accuracy Source: Dey et al, 2010 Source: parFE project 10 / 32
16. 16. Weakly scalable solvers Weak scalability NOT ONLY a computer science issue Synergy of advanced mathematical algorithms and implementations Naive iterative solvers neither scale in theory nor in practice 0 50 100 150 200 250 300 350 400 12 48 108 192 300 432 588 768 NumberofPCGiterationsfortheinterfaceproblem #subdomains stopc=res+res. rtol=1.0e-06. atol=0.0. Weak scaling with H/h=256 CG no preconditioning NN preconditioner No preconditioning (left); one-level DD precond (right) 11 / 32
17. 17. Weakly scalable solvers Weak scalability NOT ONLY a computer science issue Synergy of advanced mathematical algorithms and implementations Naive iterative solvers neither scale in theory nor in practice 0 50 100 150 200 250 300 350 400 12 48 108 192 300 432 588 768 NumberofPCGiterationsfortheinterfaceproblem #subdomains stopc=res+res. rtol=1.0e-06. atol=0.0. Weak scaling with H/h=256 CG no preconditioning NN preconditioner No preconditioning (left); one-level DD precond (right) 11 / 32
18. 18. Outline 1 Computational Science and Engineering 2 Supercomputing in CSE 3 Multilevel solvers (LSSC-CIMNE) 4 Space-time solvers (LSSC-CIMNE) 5 Conclusions and future work 11 / 32
19. 19. Scalable linear solvers (AMG) Most scalable solvers for CSE are parallel AMG (Trilinos [Lin, Shadid, Tuminaro, ...], Hypre [Falgout, Yang,...],...) Hard to scale up to largest SCs today (one million cores, < 10 PFs) Largest problems 1011 for unstructured meshes (we are there) and 1013 for structured meshes Problems: large communication/computation ratios at coarser levels, densication coarser problems,... 12 / 32
20. 20. DD for HPC Domain decomposition framework is more than a linear solver Divide-and-conquer approach Not general purpose (grid-based methods) Scalable 2-level DD: BNN [Mandel93], FETI-DP [Farhat et al01], BDDC [Dohrmann03...] Weakly scalable on paper, harder in practice on > 10, 000 procs Multilevel DD methods recent, e.g., MLBDDC [Mandel et al08] 13 / 32
21. 21. DD for HPC Domain decomposition framework is more than a linear solver Divide-and-conquer approach Not general purpose (grid-based methods) Scalable 2-level DD: BNN [Mandel93], FETI-DP [Farhat et al01], BDDC [Dohrmann03...] Weakly scalable on paper, harder in practice on > 10, 000 procs Multilevel DD methods recent, e.g., MLBDDC [Mandel et al08] 13 / 32
22. 22. DD for HPC Domain decomposition framework is more than a linear solver Divide-and-conquer approach Not general purpose (grid-based methods) Scalable 2-level DD: BNN [Mandel93], FETI-DP [Farhat et al01], BDDC [Dohrmann03...] Weakly scalable on paper, harder in practice on > 10, 000 procs Multilevel DD methods recent, e.g., MLBDDC [Mandel et al08] 13 / 32
23. 23. Multilevel framework MLDD based on a hierarchy of meshes/functional spaces It involves local subdomain problems at all levels (L1, L2, ...) FE mesh Subdomains (L1) Subdomains (L2) 14 / 32
24. 24. Outline Motivation: Develop a multilevel framework for scalable linear algebra (MLBDDC) 15 / 32
25. 25. Outline Motivation: Develop a multilevel framework for scalable linear algebra (MLBDDC) All implementations in FEMPAR (in-house code) to be dis- tributed as open-source SW soon* * Funded by Staring Grant 258443 COMFUS and Proof of Concept Grant 640957 - FEXFEM: On a free open source extreme scale nite element software 15 / 32
26. 26. Premilinaries Element-based (non-overlapping DD) distribution (+ limited ghost info) Th T 1 h , T 2 h , T 3 h T 1 h Gluing info based on objects Object: Maximum set of interface nodes that belong to the same set of subdomains 16 / 32
27. 27. Premilinaries Element-based (non-overlapping DD) distribution (+ limited ghost info) Th T 1 h , T 2 h , T 3 h T 1 h Gluing info based on objects Object: Maximum set of interface nodes that belong to the same set of subdomains 16 / 32
28. 28. Coarser triangulation Similar to FE triangulation object but wo/ reference element Instead, aggregation info object level 1 = aggregation (vefs level 0) 17 / 32
29. 29. Coarse corner function Circle domain partitioned into 9 subdomains V1 corner basis function 18 / 32
30. 30. Coarse edge function Circle domain partitioned into 9 subdomains V1 edge basis function 19 / 32
31. 31. Multilevel/scale concurrency Fine component is local to every subdomain (parallel) Coarse global problem is the bottleneck What we can develop coarse problems that are orthogonal to ne ones Overlapped implementations 20 / 32
32. 32. Multilevel/scale concurrency Fine component is local to every subdomain (parallel) Coarse global problem is the bottleneck What we can develop coarse problems that are orthogonal to ne ones Overlapped implementations Multilevel concurrency is BASIC for extreme scalability implementations 20 / 32
33. 33. Multilevel concurrency P0 = P1 P2 t L1 duties are fully parallel L2 duties destroy scalability because # L1 procs 1000 # L2 procs L2 problem size increases w/ number of procs 21 / 32
34. 34. Multilevel concurrency P0 = P1 P2 t P3 Every processor has one level/scale duties Idling dramatically reduced (energy-aware solvers) Overlapped communications / computations among levels 21 / 32
35. 35. Multilevel concurrency P0 = P1 P2 t P3 Inter-level overlapped bulk asynchronous (MPMD) im- plementation in FEMPAR 21 / 32
36. 36. FEMPAR implementation Multilevel extension straightforward (starting the algthm with V1 and level-1 mesh) ..... core 1 core 2 core 3 core 4 core P 1 1st level MPI comm ..... ..... core 1 core