fine-grained parallel iterative solvers in openfoam ... · fine-grained parallel iterative solvers...

45
Outline Motivation and Intro PARALUTION Lib Solvers/Preconditioners Why Fine-grained? Integration in OpenFOAM Performance Conclusion Fine-grained parallel iterative solvers in OpenFOAM - utilizing multi-core and GPU devices with PARALUTION Dimitar Lukarski Division of Scientific Computing Department of Information Technology Uppsala Programming for Multicore Architectures Research Center (UPMARC) Uppsala University, Sweden Dutch OpenFOAM Users Group Sept, 2013 Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Upload: hathu

Post on 02-Apr-2018

240 views

Category:

Documents


3 download

TRANSCRIPT

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Fine-grained parallel iterative solvers inOpenFOAM - utilizing multi-core and

GPU devices with PARALUTION

Dimitar Lukarski

Division of Scientific ComputingDepartment of Information Technology

Uppsala Programming for Multicore Architectures Research Center(UPMARC)

Uppsala University, Sweden

Dutch OpenFOAM Users Group Sept, 2013

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Outline

Motivation and Introduction

PARALUTION Library

Solvers/Preconditioners

Why Do We Need Fine-grained Solvers in OpenFOAM?

Integration with OpenFOAM

Performance

Conclusion

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Motivation and Introduction

Multi/many-core systems are here

I CPUs, GPUs, Accelerators

I Trends - more cores CPU(16), MIC(60), GPU(2496)

I High performance, better performance/watt ratio

Scientific computing is expected to be

I Fast - use this new technology

I Scalable - use many cores (parallel)

I Sustainable over years

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

What is the Problem?

New algorithms

I High degree of parallelism

I Fine-grained parallelism

I Scalable = almost no communication

Software

I New programming models/languages

I Hardware-specific optimization techniques

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Good Parallel Programming is HARD

Perf

orm

ance

Complexity

1/Portability

1/Productivity

Seq

Multi-core

Many-core

Multi+many-core

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Methods are the MOST Important

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Goal

Create a library for iterative methods (linear andnon-linear systems)!

Math

I Linear operators (sparse/dense matrices, stencils)

I Vector routines

I Extendable

Software

I Current multi/many-core dev, new hardware

I Portable code

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Motivation and Introduction

PARALUTION Library

Solvers/Preconditioners

Why Do We Need Fine-grained Solvers in OpenFOAM?

Integration with OpenFOAM

Performance

Conclusion

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

PARALUTION - Library for Iterative Methods

I Hardware abstraction

I Targeting devices: CPUs + Accelerators (GPUs,...)I Run time type identification (RTTI)

I Portable results and code

I Easy to useI No knowledge of OpenMP/CUDA/OpenCL requiredI No special library/hardware required

I Open source (GPLv3)

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Middle-ware

PARALUTION

Multi/many-core

CPUGPU

New

up coming

technology

C++ code

FORTRAN

OpenFOAM

Deal II

Scientific

libraries/packages

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Operators and Vectors

Vectors

Global Matrix

Local Matrix

Global Stencil

Local Stencil

OperatorsGlobal Vector

Local Vector

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Local Vectors

I Init, ClearI Standard vector routines

I Dot productI Vector updatesI NormI ...

I Permutations

I Copy (sub-)vector to (sub-)vector

I I/O file

I Raw access via pointers

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Local Matrices

I Init, Clear

I Formats – CSR, MCSR, COO, ELL, DIA, HYB,DENSE

I Conversion

I Matrix-vector multiplication

I Permutation

I Extract a sub-matrix

I I/O fileI Graph analyzer (CPU)

I Multi-coloringI Maximal independent set

I Factorization

I Some wrapper to Intel/MKL

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Operators and Vectors

Operators/Vectors

OpenMP

Accelerator

GPU

CUDAGPU, Accelerators

OpenCL

Intel MIC

OpenMP

Dynamic

Switch

New

Backend

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Source Example

LocalMatrix<double> A;LocalVector<double> x, y ;

A.ReadFileMTX("my_matrix.mtx");x.Allocate("vector1", mat.get_nrow());y.Allocate("vector2", mat.get_ncol());

// y = A xA.Apply(x, &y);

// Print the dot product of x and ystd::cout << x.Dot(y) << std::endl;

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Source Example

LocalMatrix<double> A;LocalVector<double> x, y ;

A.ReadFileMTX("my_matrix.mtx");x.Allocate("vector1", mat.get_nrow());y.Allocate("vector2", mat.get_ncol());

A.MoveToAccelerator();x.MoveToAccelerator();y.MoveToAccelerator();

A.Apply(x, &y);std::cout << x.Dot(y) << std::endl;

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Initialization/Shutdown

#include <paralution.hpp>

using namespace paralution;

int main(int argc, char* argv[]) {

init_paralution();

info_paralution();

// ...

stop_paralution();

}

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Solvers

Solvers

Fixed-Iteration

SchemesCG, BiCGStab,

GMRES

Chebyshev

Iteration

Incomplete LU Chebyshev

MultiColored

ILU

MultiColored

GS/SGS/SOR/SSOR

Iteration Control

Preconditioners

MultiElimination

ILU Mixed-precision

Defect-correction

Multigrid

AMG/GMG

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

CG Solver

CG<LocalMatrix<double>,LocalVector<double>,double > ls;

ls.SetOperator(mat);

ls.Build();

ls.Solve(rhs, &x);

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

CG Solver

....

ls.MoveToAccelerator();mat.MoveToAccelerator();rhs.MoveToAccelerator();x.MoveToAccelerator();

ls.Solve(rhs, &x);

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

PCG Solver

CG<LocalMatrix<double>,LocalVector<double>,double > ls;

MultiColoredILU<LocalMatrix<double>,LocalVector<double>,double > p;

ls.SetOperator(mat);ls.SetPreconditioner(p);

ls.Build();

ls.Solve(rhs, &x);

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

PCG Solver (same procedure)

....

ls.MoveToAccelerator();mat.MoveToAccelerator();rhs.MoveToAccelerator();x.MoveToAccelerator();

ls.Solve(rhs, &x);

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Design and Concepts

Hardware decision

I At run time

I No template parameter

MoveToHost/Accelerator

I All objects (matrices, vectors, solvers,preconditioners, etc)

I Always a CPU implementation

Template

I ValueType - float, double, int

I Solvers - Operator/Vector type

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Design and Concepts

Functionality on the Accelerators

I Not all algorithms can be performed on theaccelerator

I Ex: greedy multi-coloring, maximal independent set

The library has an internal mechanism to check if aroutine can be performed on the accelerator or not. Ifnot, the object is moved to the host and the routine isperformed there.

I Always a CPU implementation

I Warning is printed if the object needs to be movedfor the routine

I 100% portability of the code!

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Motivation and Introduction

PARALUTION Library

Solvers/Preconditioners

Why Do We Need Fine-grained Solvers in OpenFOAM?

Integration with OpenFOAM

Performance

Conclusion

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Iterative Linear Solvers

Solvers

Fixed-Iteration

SchemesCG, BiCGStab,

GMRES

Chebyshev

Iteration

Incomplete LU Chebyshev

MultiColored

ILU

MultiColored

GS/SGS/SOR/SSOR

Iteration Control

Preconditioners

MultiElimination

ILU Mixed-precision

Defect-correction

Multigrid

AMG/GMG

I Preconditioner = a solver

I Flexible design

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Solver as Preconditioner

I Every solver can be used as a preconditioner

I Set a lower stopping criteria

I or/and Fixed number of iterations

Mostly used

I GS, SOR or SGS, SSOR

I GMG/AMG

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

PCG-AMG

CG<LocalMatrix<double>,LocalVector<double>,double > ls;

AMG<LocalMatrix<double>,LocalVector<double>,double > p;

p.InitMaxIter(2);

ls.SetPreconditioner(p);ls.SetOperator(mat);

ls.Build();

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Geometric Multigrid

Your application constructs

I Prolongation/Restriction operators or neighbormapping vector

Pass it to PARALUTION and obtain

I Geometric Multigrid

I Internally constructed smoothers

I You can construct or pass the system matrices on alllevels

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Motivation and Introduction

PARALUTION Library

Solvers/Preconditioners

Why Do We Need Fine-grained Solvers in OpenFOAM?

Integration with OpenFOAM

Performance

Conclusion

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Why Fine-grained Solvers Are Important?

NTNU HPC Group report (Performance on Vilje)

I Incompressible NS problem

I What to use PCG or GAMG?

I Small cluster - use GAMG

I Large cluster - use PCG

What is large and what is small?

I About 144

I But this is only 9 nodes?

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Motivation and Introduction

PARALUTION Library

Solvers/Preconditioners

Why Do We Need Fine-grained Solvers in OpenFOAM?

Integration with OpenFOAM

Performance

Conclusion

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

OpenFOAM plug-in

I Compile PARALUTION and the OpenFOAM plug-in

I Set the links to the OpenFOAM plug-in

I Configure the linear solver (txt file)

What the plug-in does?

I Gets the matrix

I Gets the rhs/initial solution

I Runs a linear solver (any kind!)

I Returns the solution vector

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

OpenFOAM plug-in

You can control it via the fvSolution file

I Solver / Preconditioner

I Advanced solver/preconditioner settings

I Matrix format (CSR, MCSR, ELL, DIA, HYB)

I Stopping criteria

I Offload the computation to the accelerator or not

Current solvers

I (P)CG, (P)BiCGStab, (P)GMRES (available)

I AMG, GAMG (available)

I Mixed-precision (in-house)

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

OpenFOAM plug-in AMG and GAMG

AMG

I Purely algebraic multigrid solver

I The building is performed mainly on the CPU

I Updating the AMG (e.g. for CFD problems) can becomputed on the GPU (on going research work)

I The whole solution procedure is available for allbackends

I Control over - number of levels, smoothers(pre/post, type), coarse solver, matrix formats

GAMG

I We import restriction/prolongation operators

I Pass it to PARALUTION

I Build smoothers

I Call the multigrid solver (all backends)Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Motivation and Introduction

PARALUTION Library

Solvers/Preconditioners

Why Do We Need Fine-grained Solvers in OpenFOAM?

Integration with OpenFOAM

Performance

Conclusion

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Hardware/Software Configuration

I HardwareI Host: 2 x eight-core Xeon E5-2680 (NUMA)I GPU: K20 with ECC on

I PARALUTION backendsI Host - OpenMP (16 threads)I GPU - CUDAI Identical source code!

I OpenFOAMI Host - 16 MPI proc

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Laplace Problem

I 2D square domain, 9M unknowns

I laplacianFoam

I absolute tolerance of 1e−10 based on L1 and L2norm

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Laplace Problem

PCG# iter Time [s] Speed-up

OpenFOAM DIC 2910 667 1.0×OpenFOAM DIC MPI 3289 179 3.72×PARALUTION HOST L1 809 132 5.05×PARALUTION HOST L2 564 111 6.00×PARALUTION GPU L1 4820 94.6 7.05×PARALUTION GPU L2 3740 74.8 8.91×

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

CFD Problem

I Incompressible NS

I 6.8M 3D Cavity (190x190x190)

I icoFoam

I delta t = 0.25e−5

I Pressure solver only

I absolute tolerance of 1e−6 based on L1 and L2 norm

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

CFD Problem

Preliminary results (still under development)!

GAMG vs PCG-AMG*# iter Time [s] Speed-up

OpenFOAM 11.3 19.3 1.0 ×OpenFOAM MPI 19.0 6.5 3.0 ×PARALUTION OMP L1 6.5 5.3 3.6 ×PARALUTION OMP L2 1.3 2.8 6.9 ×PARALUTION GPU L1 6.5 3.6 5.4 ×PARALUTION GPU L2 1.3 2.0 9.7 ×

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Motivation and Introduction

PARALUTION Library

Solvers/Preconditioners

Why Do We Need Fine-grained Solvers in OpenFOAM?

Integration with OpenFOAM

Performance

Conclusion

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

PARALUTION

Library for iterative sparse methods

I Various iterative solvers

I Various preconditioners

Backends

I OpenMP, CUDA, OpenCL

I Open for new hardware

I Portable results and code

General

I No special library/hardware required

I Easy to use

I OpenFOAM plug-in!

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Ongoing Research

PCG-AMG

I Special construction for time-dependent PDE

Multi-node support

I MPI layer

I Special AMG for the MPI version

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013

Outline

Motivation and Intro

PARALUTION Lib

Solvers/Preconditioners

Why Fine-grained?

Integration in OpenFOAM

Performance

Conclusion

Thank You for Your Attention

PARALUTION

www.paralution.com

Lukarski, Dutch OpenFOAM Users Group, Sept, 2013