design and optimization of openfoam-based cfd applications

31
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi * , David E. Keyes * , Alexey Lastovetsky , Vladimir Rychkov * Extreme Computing Research Center, KAUST, Thuwal, Saudi Arabia, Heterogeneous Computing Laboratory, UCD, Dublin, Ireland, [email protected] May 21, 2014 Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 1 / 31

Upload: others

Post on 06-May-2022

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design and Optimization of OpenFOAM-based CFD Applications

Design and Optimization of OpenFOAM-basedCFD Applications for Hybrid and Heterogeneous

HPC Platforms

Amani AlOnazi∗, David E. Keyes∗, Alexey Lastovetsky†, VladimirRychkov†

∗Extreme Computing Research Center, KAUST, Thuwal, Saudi Arabia,†Heterogeneous Computing Laboratory, UCD, Dublin, Ireland,

[email protected]

May 21, 2014

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 1 / 31

Page 2: Design and Optimization of OpenFOAM-based CFD Applications

Overview

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 2 / 31

Page 3: Design and Optimization of OpenFOAM-based CFD Applications

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 3 / 31

Page 4: Design and Optimization of OpenFOAM-based CFD Applications

Motivation

Hardware changes have to be taken into accountsI Parallelism and heterogeneity in modern HW

Per-processor performance on heterogeneous systems

Algorithms and codes have to be redesigned

B The heterogeneity of these platforms leads to several challenges andmuch contemporary attention is devoted to new software solutions. Thistrend in the HPC platforms invites redesign of the CFD packages or thealgorithms themselves to use these platforms efficiently.

“ I would rather have today’s algorithms on yesterday’s computersthan vice versa.”

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 4 / 31

Page 5: Design and Optimization of OpenFOAM-based CFD Applications

OpenFOAM CFD Package

Open source Field Operation And Manipulation (OpenFOAM) is a li-brary written in C++ used to solve PDEs

It covers a wide range of solvers employed in CFD, such as Laplaceand Poisson equations, incompressible flow, multiphase flow, and userdefined models

Free, open source, parallel, modular, and flexible

Large, user-driven support community

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 5 / 31

Page 6: Design and Optimization of OpenFOAM-based CFD Applications

OpenFOAM Parallel Approach

MPI Bulk Synchronous Parallel Model

OpenFOAM uses MPI to provide parallel multi-processorsfunctionality

The mesh and its associated fields are divided into subdomains andallocated to separate processors (domain decomposition)

Domain Decomposition Methods

It provides different utilities for partitioning the domain, such as:

Simple

Hierarchal

METIS

SCOTCH

The communication between the subdomains is implicitly handled withinthe package

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 6 / 31

Page 7: Design and Optimization of OpenFOAM-based CFD Applications

OpenFOAM Selected Applications

icoFoam The incompressible lam-inar Navier-Stokes equations canbe solved by icoFoam, which ap-plies the PISO algorithm in timestepping loop.

∇ � u = 0

∂u

∂t+∇ �(uu)−∇ �(ν∇u) = −∇p

p: CG

u: Bi-CGSTAB

laplacianFoam The solver is usedto find the solution of the Laplacianequation. The equation contains onevariable, a passive scalar, for instance,a temperature, T .

∂T

∂t−∇2(DT · T) = 0

T : CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 7 / 31

Page 8: Design and Optimization of OpenFOAM-based CFD Applications

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 8 / 31

Page 9: Design and Optimization of OpenFOAM-based CFD Applications

Performance Analysis: Platform

A single node consisting of twosockets, each socket is connectedto a GPU device (Tesla C2050)

The socket has 6 dual cores IntelXeon CPU X5670 @ 2.93GHz

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 9 / 31

Page 10: Design and Optimization of OpenFOAM-based CFD Applications

laplacianFoam: 3D Heat Equation

Heat equation test case solved over a three-dimensional slab geometry usingthe laplacianFoam.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 10 / 31

Page 11: Design and Optimization of OpenFOAM-based CFD Applications

icoFoam: Lid-driven Cavity flowThe lid-driven cavity flow test case contains the solution of a laminar,isothermal and incompressible flow over a three-dimensional cubic geom-etry. The top boundary of the cube is a wall that moves in the x direction,whereas the rest are static walls.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 11 / 31

Page 12: Design and Optimization of OpenFOAM-based CFD Applications

Conjugate Gradient 1 iteration

A single iteration of the conjugate gradient solver using a matrix derivedfrom the 3D heat equation.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 12 / 31

Page 13: Design and Optimization of OpenFOAM-based CFD Applications

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 13 / 31

Page 14: Design and Optimization of OpenFOAM-based CFD Applications

Proposed Optimizations

Hybrid CG Solver

Hybrid Pipelined CG Solver

Heterogenous Decomposition

“ Future computing architectures will be hybrid systems withparallel-core GPUs working in tandem with multi-core CPUs. ”

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 14 / 31

Page 15: Design and Optimization of OpenFOAM-based CFD Applications

Hybrid CG

Multicore/MultiGPU

CUDA Kernel is based onCUSP and Thrust calls.

CSR Format in the GPU andSkyline Format in the CPU

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 15 / 31

Page 16: Design and Optimization of OpenFOAM-based CFD Applications

Hybrid Pipelined CG

Recently introduced byGhysels and Vanroose∗Reduces the globalcommunication to 1 timeinstead of three

Offers better scalability atthe price of extracomputations

r0 = b − Axw0 = Ar0k = 0while ‖r‖2 ≥ τ do

λk = dotc(rk , rk )δ = dotc(wk , rk )q = SpMV (A,wk )if (k >0) then

β = λk / λk−1

α = λk / (δ - (β∗λk )/α)else

β = 0α = λk / δ

end ifz = q + β ∗ zs = w + β ∗ sp = r + β ∗ px = x + α ∗ pr = r − α ∗ sw = w − α ∗ z

end while

∗ P. Ghysels and W. Vanroose, Hiding global synchronization latency in the preconditioned con-

jugate gradient algorithm, Parallel Computing, 2013.Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 16 / 31

Page 17: Design and Optimization of OpenFOAM-based CFD Applications

Hybrid Pipelined CG

Multicore/MultiGPU

CUDA Kernel is based onCUSP and Thrust calls.

CSR Format in the GPU andSkyline Format in the CPU

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 17 / 31

Page 18: Design and Optimization of OpenFOAM-based CFD Applications

Heterogenous Decomposition

The main idea is to adequately partition and assign the subdomain tothe processor in proportion to its performance

I more powerful processors get larger subdomains

Combines the performance model and the METIS/SCOTCH library

The load of the processor will be balanced if the number ofcomputations performed by each processor is accordance to its speedon execution the kernel

si (ni ) =ni + 2FiT (ni )

ri (ni ) =si (ni )∑p si (ni )

ni ,new = N ∗ ri (ni )

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 18 / 31

Page 19: Design and Optimization of OpenFOAM-based CFD Applications

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 19 / 31

Page 20: Design and Optimization of OpenFOAM-based CFD Applications

Experimental Results: Heterogenous Decomposition

Metis Heterogeneous0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

Decomositon Methods

Ave

rage

Tim

e P

er I

tera

tion

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 20 / 31

Page 21: Design and Optimization of OpenFOAM-based CFD Applications

Experimental Results: Speedup icoFoam I

N=100k

2 4 8 12 160

0.5

1

1.5

2

2.5

3

3.5

Processors

Sp

eed

up

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 21 / 31

Page 22: Design and Optimization of OpenFOAM-based CFD Applications

Experimental Results: Speedup icoFoam II

N=1M

2 4 8 12 160

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Processors

Sp

eed

up

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 22 / 31

Page 23: Design and Optimization of OpenFOAM-based CFD Applications

Experimental Results: Speedup laplacianFoam I

N=100k

2 4 8 12 160

1

2

3

4

5

6

7

8

Processors

Spee

dup

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 23 / 31

Page 24: Design and Optimization of OpenFOAM-based CFD Applications

Experimental Results: Speedup laplacianFoam II

N=1M

2 4 8 12 160

0.5

1

1.5

2

2.5

3

3.5

4

Processors

Spee

dup

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 24 / 31

Page 25: Design and Optimization of OpenFOAM-based CFD Applications

Experimental Results: Performance

103

104

105

106

107

1

2

4

8

16

32

64

128

256

512

Problem Size

MF

LO

Ps/

S

Hybrid Pipelined CGHybrid CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 25 / 31

Page 26: Design and Optimization of OpenFOAM-based CFD Applications

Experimental Results: Roofline

0.125 0.25 0.5 1 2 4 8 16

1

2

4

8

16

32

64

128

256

512

FLOP:BYTE Ratio

Att

aina

ble

Gflo

p/s

DRAM BWPCI BWStream DRAM BW

Peak DP

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 26 / 31

Page 27: Design and Optimization of OpenFOAM-based CFD Applications

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 27 / 31

Page 28: Design and Optimization of OpenFOAM-based CFD Applications

Conclusion

Memory bound applications, such as the OpenFOAM selected solvers,can take better advantage of the full hardware potential, which is nowcomplex, hybrid and heterogeneous, if all resources are taken into ac-counts in a holistic approach.

Vector reduction kernel performs n × 1 memory transactions, with nthe vector size and cannot be combined with other operations → lowarithmetic intensity, low memory throughput and poor scalability whenincreasing number of GPU/CPU.

The need for dynamic load balancing scheduling, which adaptively bal-ances the workload during the run-time, by memory-aware work steal-ing.

The experimental results show that the hybrid implementation of bothsolvers significantly outperforms state-of-the-art implementations of awidely used open source package.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 28 / 31

Page 29: Design and Optimization of OpenFOAM-based CFD Applications

Thank You!

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 29 / 31

Page 30: Design and Optimization of OpenFOAM-based CFD Applications

Backup Slides.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 30 / 31

Page 31: Design and Optimization of OpenFOAM-based CFD Applications

PISO Algorithm

Pressure Implicit with Splitting of Operators (PISO)*1 Set the boundary conditions.2 Solve the discretized momentum equation to compute an intermediate

velocity field.3 Compute the mass fluxes at the cells faces.4 Solve the pressure equation.5 Correct the mass fluxes at the cell faces.6 Correct the velocities on the basis of the new pressure field.7 Update the boundary conditions.8 Repeat from 3 for the prescribed number of times.9 Increase the time step and repeat from 1.

*J. H. Ferziger, M. Peric, Computational Methods for Fluid Dynamics, Springer, 3rd Ed., 2001. H. Jasak, Error Analysis

and Estimation for the Finite Volume Method with Applications to Fluid Flows, Ph.D. Thesis, Imperial College, London, 1996.

http://openfoamwiki.net

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 31 / 31