fine-grained parallel iterative solvers in openfoam ... · fine-grained parallel iterative solvers...
TRANSCRIPT
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Fine-grained parallel iterative solvers inOpenFOAM - utilizing multi-core and
GPU devices with PARALUTION
Dimitar Lukarski
Division of Scientific ComputingDepartment of Information Technology
Uppsala Programming for Multicore Architectures Research Center(UPMARC)
Uppsala University, Sweden
Dutch OpenFOAM Users Group Sept, 2013
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Outline
Motivation and Introduction
PARALUTION Library
Solvers/Preconditioners
Why Do We Need Fine-grained Solvers in OpenFOAM?
Integration with OpenFOAM
Performance
Conclusion
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Motivation and Introduction
Multi/many-core systems are here
I CPUs, GPUs, Accelerators
I Trends - more cores CPU(16), MIC(60), GPU(2496)
I High performance, better performance/watt ratio
Scientific computing is expected to be
I Fast - use this new technology
I Scalable - use many cores (parallel)
I Sustainable over years
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
What is the Problem?
New algorithms
I High degree of parallelism
I Fine-grained parallelism
I Scalable = almost no communication
Software
I New programming models/languages
I Hardware-specific optimization techniques
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Good Parallel Programming is HARD
Perf
orm
ance
Complexity
1/Portability
1/Productivity
Seq
Multi-core
Many-core
Multi+many-core
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Methods are the MOST Important
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Goal
Create a library for iterative methods (linear andnon-linear systems)!
Math
I Linear operators (sparse/dense matrices, stencils)
I Vector routines
I Extendable
Software
I Current multi/many-core dev, new hardware
I Portable code
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Motivation and Introduction
PARALUTION Library
Solvers/Preconditioners
Why Do We Need Fine-grained Solvers in OpenFOAM?
Integration with OpenFOAM
Performance
Conclusion
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
PARALUTION - Library for Iterative Methods
I Hardware abstraction
I Targeting devices: CPUs + Accelerators (GPUs,...)I Run time type identification (RTTI)
I Portable results and code
I Easy to useI No knowledge of OpenMP/CUDA/OpenCL requiredI No special library/hardware required
I Open source (GPLv3)
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Middle-ware
PARALUTION
Multi/many-core
CPUGPU
New
up coming
technology
C++ code
FORTRAN
OpenFOAM
Deal II
Scientific
libraries/packages
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Operators and Vectors
Vectors
Global Matrix
Local Matrix
Global Stencil
Local Stencil
OperatorsGlobal Vector
Local Vector
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Local Vectors
I Init, ClearI Standard vector routines
I Dot productI Vector updatesI NormI ...
I Permutations
I Copy (sub-)vector to (sub-)vector
I I/O file
I Raw access via pointers
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Local Matrices
I Init, Clear
I Formats – CSR, MCSR, COO, ELL, DIA, HYB,DENSE
I Conversion
I Matrix-vector multiplication
I Permutation
I Extract a sub-matrix
I I/O fileI Graph analyzer (CPU)
I Multi-coloringI Maximal independent set
I Factorization
I Some wrapper to Intel/MKL
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Operators and Vectors
Operators/Vectors
OpenMP
Accelerator
GPU
CUDAGPU, Accelerators
OpenCL
Intel MIC
OpenMP
Dynamic
Switch
New
Backend
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Source Example
LocalMatrix<double> A;LocalVector<double> x, y ;
A.ReadFileMTX("my_matrix.mtx");x.Allocate("vector1", mat.get_nrow());y.Allocate("vector2", mat.get_ncol());
// y = A xA.Apply(x, &y);
// Print the dot product of x and ystd::cout << x.Dot(y) << std::endl;
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Source Example
LocalMatrix<double> A;LocalVector<double> x, y ;
A.ReadFileMTX("my_matrix.mtx");x.Allocate("vector1", mat.get_nrow());y.Allocate("vector2", mat.get_ncol());
A.MoveToAccelerator();x.MoveToAccelerator();y.MoveToAccelerator();
A.Apply(x, &y);std::cout << x.Dot(y) << std::endl;
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Initialization/Shutdown
#include <paralution.hpp>
using namespace paralution;
int main(int argc, char* argv[]) {
init_paralution();
info_paralution();
// ...
stop_paralution();
}
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Solvers
Solvers
Fixed-Iteration
SchemesCG, BiCGStab,
GMRES
Chebyshev
Iteration
Incomplete LU Chebyshev
MultiColored
ILU
MultiColored
GS/SGS/SOR/SSOR
Iteration Control
Preconditioners
MultiElimination
ILU Mixed-precision
Defect-correction
Multigrid
AMG/GMG
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
CG Solver
CG<LocalMatrix<double>,LocalVector<double>,double > ls;
ls.SetOperator(mat);
ls.Build();
ls.Solve(rhs, &x);
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
CG Solver
....
ls.MoveToAccelerator();mat.MoveToAccelerator();rhs.MoveToAccelerator();x.MoveToAccelerator();
ls.Solve(rhs, &x);
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
PCG Solver
CG<LocalMatrix<double>,LocalVector<double>,double > ls;
MultiColoredILU<LocalMatrix<double>,LocalVector<double>,double > p;
ls.SetOperator(mat);ls.SetPreconditioner(p);
ls.Build();
ls.Solve(rhs, &x);
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
PCG Solver (same procedure)
....
ls.MoveToAccelerator();mat.MoveToAccelerator();rhs.MoveToAccelerator();x.MoveToAccelerator();
ls.Solve(rhs, &x);
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Design and Concepts
Hardware decision
I At run time
I No template parameter
MoveToHost/Accelerator
I All objects (matrices, vectors, solvers,preconditioners, etc)
I Always a CPU implementation
Template
I ValueType - float, double, int
I Solvers - Operator/Vector type
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Design and Concepts
Functionality on the Accelerators
I Not all algorithms can be performed on theaccelerator
I Ex: greedy multi-coloring, maximal independent set
The library has an internal mechanism to check if aroutine can be performed on the accelerator or not. Ifnot, the object is moved to the host and the routine isperformed there.
I Always a CPU implementation
I Warning is printed if the object needs to be movedfor the routine
I 100% portability of the code!
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Motivation and Introduction
PARALUTION Library
Solvers/Preconditioners
Why Do We Need Fine-grained Solvers in OpenFOAM?
Integration with OpenFOAM
Performance
Conclusion
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Iterative Linear Solvers
Solvers
Fixed-Iteration
SchemesCG, BiCGStab,
GMRES
Chebyshev
Iteration
Incomplete LU Chebyshev
MultiColored
ILU
MultiColored
GS/SGS/SOR/SSOR
Iteration Control
Preconditioners
MultiElimination
ILU Mixed-precision
Defect-correction
Multigrid
AMG/GMG
I Preconditioner = a solver
I Flexible design
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Solver as Preconditioner
I Every solver can be used as a preconditioner
I Set a lower stopping criteria
I or/and Fixed number of iterations
Mostly used
I GS, SOR or SGS, SSOR
I GMG/AMG
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
PCG-AMG
CG<LocalMatrix<double>,LocalVector<double>,double > ls;
AMG<LocalMatrix<double>,LocalVector<double>,double > p;
p.InitMaxIter(2);
ls.SetPreconditioner(p);ls.SetOperator(mat);
ls.Build();
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Geometric Multigrid
Your application constructs
I Prolongation/Restriction operators or neighbormapping vector
Pass it to PARALUTION and obtain
I Geometric Multigrid
I Internally constructed smoothers
I You can construct or pass the system matrices on alllevels
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Motivation and Introduction
PARALUTION Library
Solvers/Preconditioners
Why Do We Need Fine-grained Solvers in OpenFOAM?
Integration with OpenFOAM
Performance
Conclusion
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Why Fine-grained Solvers Are Important?
NTNU HPC Group report (Performance on Vilje)
I Incompressible NS problem
I What to use PCG or GAMG?
I Small cluster - use GAMG
I Large cluster - use PCG
What is large and what is small?
I About 144
I But this is only 9 nodes?
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Motivation and Introduction
PARALUTION Library
Solvers/Preconditioners
Why Do We Need Fine-grained Solvers in OpenFOAM?
Integration with OpenFOAM
Performance
Conclusion
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
OpenFOAM plug-in
I Compile PARALUTION and the OpenFOAM plug-in
I Set the links to the OpenFOAM plug-in
I Configure the linear solver (txt file)
What the plug-in does?
I Gets the matrix
I Gets the rhs/initial solution
I Runs a linear solver (any kind!)
I Returns the solution vector
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
OpenFOAM plug-in
You can control it via the fvSolution file
I Solver / Preconditioner
I Advanced solver/preconditioner settings
I Matrix format (CSR, MCSR, ELL, DIA, HYB)
I Stopping criteria
I Offload the computation to the accelerator or not
Current solvers
I (P)CG, (P)BiCGStab, (P)GMRES (available)
I AMG, GAMG (available)
I Mixed-precision (in-house)
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
OpenFOAM plug-in AMG and GAMG
AMG
I Purely algebraic multigrid solver
I The building is performed mainly on the CPU
I Updating the AMG (e.g. for CFD problems) can becomputed on the GPU (on going research work)
I The whole solution procedure is available for allbackends
I Control over - number of levels, smoothers(pre/post, type), coarse solver, matrix formats
GAMG
I We import restriction/prolongation operators
I Pass it to PARALUTION
I Build smoothers
I Call the multigrid solver (all backends)Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Motivation and Introduction
PARALUTION Library
Solvers/Preconditioners
Why Do We Need Fine-grained Solvers in OpenFOAM?
Integration with OpenFOAM
Performance
Conclusion
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Hardware/Software Configuration
I HardwareI Host: 2 x eight-core Xeon E5-2680 (NUMA)I GPU: K20 with ECC on
I PARALUTION backendsI Host - OpenMP (16 threads)I GPU - CUDAI Identical source code!
I OpenFOAMI Host - 16 MPI proc
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Laplace Problem
I 2D square domain, 9M unknowns
I laplacianFoam
I absolute tolerance of 1e−10 based on L1 and L2norm
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Laplace Problem
PCG# iter Time [s] Speed-up
OpenFOAM DIC 2910 667 1.0×OpenFOAM DIC MPI 3289 179 3.72×PARALUTION HOST L1 809 132 5.05×PARALUTION HOST L2 564 111 6.00×PARALUTION GPU L1 4820 94.6 7.05×PARALUTION GPU L2 3740 74.8 8.91×
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
CFD Problem
I Incompressible NS
I 6.8M 3D Cavity (190x190x190)
I icoFoam
I delta t = 0.25e−5
I Pressure solver only
I absolute tolerance of 1e−6 based on L1 and L2 norm
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
CFD Problem
Preliminary results (still under development)!
GAMG vs PCG-AMG*# iter Time [s] Speed-up
OpenFOAM 11.3 19.3 1.0 ×OpenFOAM MPI 19.0 6.5 3.0 ×PARALUTION OMP L1 6.5 5.3 3.6 ×PARALUTION OMP L2 1.3 2.8 6.9 ×PARALUTION GPU L1 6.5 3.6 5.4 ×PARALUTION GPU L2 1.3 2.0 9.7 ×
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Motivation and Introduction
PARALUTION Library
Solvers/Preconditioners
Why Do We Need Fine-grained Solvers in OpenFOAM?
Integration with OpenFOAM
Performance
Conclusion
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
PARALUTION
Library for iterative sparse methods
I Various iterative solvers
I Various preconditioners
Backends
I OpenMP, CUDA, OpenCL
I Open for new hardware
I Portable results and code
General
I No special library/hardware required
I Easy to use
I OpenFOAM plug-in!
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013
Outline
Motivation and Intro
PARALUTION Lib
Solvers/Preconditioners
Why Fine-grained?
Integration in OpenFOAM
Performance
Conclusion
Ongoing Research
PCG-AMG
I Special construction for time-dependent PDE
Multi-node support
I MPI layer
I Special AMG for the MPI version
Lukarski, Dutch OpenFOAM Users Group, Sept, 2013