design and optimization of openfoam-based cfd applications

Post on 06-May-2022

15 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Design and Optimization of OpenFOAM-basedCFD Applications for Hybrid and Heterogeneous

HPC Platforms

Amani AlOnazi∗, David E. Keyes∗, Alexey Lastovetsky†, VladimirRychkov†

∗Extreme Computing Research Center, KAUST, Thuwal, Saudi Arabia,†Heterogeneous Computing Laboratory, UCD, Dublin, Ireland,

amani.alonazi@kaust.edu.sa

May 21, 2014

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 1 / 31

Overview

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 2 / 31

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 3 / 31

Motivation

Hardware changes have to be taken into accountsI Parallelism and heterogeneity in modern HW

Per-processor performance on heterogeneous systems

Algorithms and codes have to be redesigned

B The heterogeneity of these platforms leads to several challenges andmuch contemporary attention is devoted to new software solutions. Thistrend in the HPC platforms invites redesign of the CFD packages or thealgorithms themselves to use these platforms efficiently.

“ I would rather have today’s algorithms on yesterday’s computersthan vice versa.”

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 4 / 31

OpenFOAM CFD Package

Open source Field Operation And Manipulation (OpenFOAM) is a li-brary written in C++ used to solve PDEs

It covers a wide range of solvers employed in CFD, such as Laplaceand Poisson equations, incompressible flow, multiphase flow, and userdefined models

Free, open source, parallel, modular, and flexible

Large, user-driven support community

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 5 / 31

OpenFOAM Parallel Approach

MPI Bulk Synchronous Parallel Model

OpenFOAM uses MPI to provide parallel multi-processorsfunctionality

The mesh and its associated fields are divided into subdomains andallocated to separate processors (domain decomposition)

Domain Decomposition Methods

It provides different utilities for partitioning the domain, such as:

Simple

Hierarchal

METIS

SCOTCH

The communication between the subdomains is implicitly handled withinthe package

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 6 / 31

OpenFOAM Selected Applications

icoFoam The incompressible lam-inar Navier-Stokes equations canbe solved by icoFoam, which ap-plies the PISO algorithm in timestepping loop.

∇ � u = 0

∂u

∂t+∇ �(uu)−∇ �(ν∇u) = −∇p

p: CG

u: Bi-CGSTAB

laplacianFoam The solver is usedto find the solution of the Laplacianequation. The equation contains onevariable, a passive scalar, for instance,a temperature, T .

∂T

∂t−∇2(DT · T) = 0

T : CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 7 / 31

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 8 / 31

Performance Analysis: Platform

A single node consisting of twosockets, each socket is connectedto a GPU device (Tesla C2050)

The socket has 6 dual cores IntelXeon CPU X5670 @ 2.93GHz

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 9 / 31

laplacianFoam: 3D Heat Equation

Heat equation test case solved over a three-dimensional slab geometry usingthe laplacianFoam.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 10 / 31

icoFoam: Lid-driven Cavity flowThe lid-driven cavity flow test case contains the solution of a laminar,isothermal and incompressible flow over a three-dimensional cubic geom-etry. The top boundary of the cube is a wall that moves in the x direction,whereas the rest are static walls.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 11 / 31

Conjugate Gradient 1 iteration

A single iteration of the conjugate gradient solver using a matrix derivedfrom the 3D heat equation.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 12 / 31

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 13 / 31

Proposed Optimizations

Hybrid CG Solver

Hybrid Pipelined CG Solver

Heterogenous Decomposition

“ Future computing architectures will be hybrid systems withparallel-core GPUs working in tandem with multi-core CPUs. ”

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 14 / 31

Hybrid CG

Multicore/MultiGPU

CUDA Kernel is based onCUSP and Thrust calls.

CSR Format in the GPU andSkyline Format in the CPU

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 15 / 31

Hybrid Pipelined CG

Recently introduced byGhysels and Vanroose∗Reduces the globalcommunication to 1 timeinstead of three

Offers better scalability atthe price of extracomputations

r0 = b − Axw0 = Ar0k = 0while ‖r‖2 ≥ τ do

λk = dotc(rk , rk )δ = dotc(wk , rk )q = SpMV (A,wk )if (k >0) then

β = λk / λk−1

α = λk / (δ - (β∗λk )/α)else

β = 0α = λk / δ

end ifz = q + β ∗ zs = w + β ∗ sp = r + β ∗ px = x + α ∗ pr = r − α ∗ sw = w − α ∗ z

end while

∗ P. Ghysels and W. Vanroose, Hiding global synchronization latency in the preconditioned con-

jugate gradient algorithm, Parallel Computing, 2013.Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 16 / 31

Hybrid Pipelined CG

Multicore/MultiGPU

CUDA Kernel is based onCUSP and Thrust calls.

CSR Format in the GPU andSkyline Format in the CPU

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 17 / 31

Heterogenous Decomposition

The main idea is to adequately partition and assign the subdomain tothe processor in proportion to its performance

I more powerful processors get larger subdomains

Combines the performance model and the METIS/SCOTCH library

The load of the processor will be balanced if the number ofcomputations performed by each processor is accordance to its speedon execution the kernel

si (ni ) =ni + 2FiT (ni )

ri (ni ) =si (ni )∑p si (ni )

ni ,new = N ∗ ri (ni )

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 18 / 31

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 19 / 31

Experimental Results: Heterogenous Decomposition

Metis Heterogeneous0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

Decomositon Methods

Ave

rage

Tim

e P

er I

tera

tion

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 20 / 31

Experimental Results: Speedup icoFoam I

N=100k

2 4 8 12 160

0.5

1

1.5

2

2.5

3

3.5

Processors

Sp

eed

up

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 21 / 31

Experimental Results: Speedup icoFoam II

N=1M

2 4 8 12 160

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Processors

Sp

eed

up

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 22 / 31

Experimental Results: Speedup laplacianFoam I

N=100k

2 4 8 12 160

1

2

3

4

5

6

7

8

Processors

Spee

dup

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 23 / 31

Experimental Results: Speedup laplacianFoam II

N=1M

2 4 8 12 160

0.5

1

1.5

2

2.5

3

3.5

4

Processors

Spee

dup

Hybrid CGHybrid Pipelined CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 24 / 31

Experimental Results: Performance

103

104

105

106

107

1

2

4

8

16

32

64

128

256

512

Problem Size

MF

LO

Ps/

S

Hybrid Pipelined CGHybrid CG

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 25 / 31

Experimental Results: Roofline

0.125 0.25 0.5 1 2 4 8 16

1

2

4

8

16

32

64

128

256

512

FLOP:BYTE Ratio

Att

aina

ble

Gflo

p/s

DRAM BWPCI BWStream DRAM BW

Peak DP

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 26 / 31

1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications

2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient

3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition

4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance

5 Conclusion

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 27 / 31

Conclusion

Memory bound applications, such as the OpenFOAM selected solvers,can take better advantage of the full hardware potential, which is nowcomplex, hybrid and heterogeneous, if all resources are taken into ac-counts in a holistic approach.

Vector reduction kernel performs n × 1 memory transactions, with nthe vector size and cannot be combined with other operations → lowarithmetic intensity, low memory throughput and poor scalability whenincreasing number of GPU/CPU.

The need for dynamic load balancing scheduling, which adaptively bal-ances the workload during the run-time, by memory-aware work steal-ing.

The experimental results show that the hybrid implementation of bothsolvers significantly outperforms state-of-the-art implementations of awidely used open source package.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 28 / 31

Thank You!

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 29 / 31

Backup Slides.

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 30 / 31

PISO Algorithm

Pressure Implicit with Splitting of Operators (PISO)*1 Set the boundary conditions.2 Solve the discretized momentum equation to compute an intermediate

velocity field.3 Compute the mass fluxes at the cells faces.4 Solve the pressure equation.5 Correct the mass fluxes at the cell faces.6 Correct the velocities on the basis of the new pressure field.7 Update the boundary conditions.8 Repeat from 3 for the prescribed number of times.9 Increase the time step and repeat from 1.

*J. H. Ferziger, M. Peric, Computational Methods for Fluid Dynamics, Springer, 3rd Ed., 2001. H. Jasak, Error Analysis

and Estimation for the Finite Volume Method with Applications to Fluid Flows, Ph.D. Thesis, Imperial College, London, 1996.

http://openfoamwiki.net

Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 31 / 31

top related