design and optimization of openfoam-based cfd applications
TRANSCRIPT
Design and Optimization of OpenFOAM-basedCFD Applications for Hybrid and Heterogeneous
HPC Platforms
Amani AlOnazi∗, David E. Keyes∗, Alexey Lastovetsky†, VladimirRychkov†
∗Extreme Computing Research Center, KAUST, Thuwal, Saudi Arabia,†Heterogeneous Computing Laboratory, UCD, Dublin, Ireland,
May 21, 2014
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 1 / 31
Overview
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 2 / 31
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 3 / 31
Motivation
Hardware changes have to be taken into accountsI Parallelism and heterogeneity in modern HW
Per-processor performance on heterogeneous systems
Algorithms and codes have to be redesigned
B The heterogeneity of these platforms leads to several challenges andmuch contemporary attention is devoted to new software solutions. Thistrend in the HPC platforms invites redesign of the CFD packages or thealgorithms themselves to use these platforms efficiently.
“ I would rather have today’s algorithms on yesterday’s computersthan vice versa.”
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 4 / 31
OpenFOAM CFD Package
Open source Field Operation And Manipulation (OpenFOAM) is a li-brary written in C++ used to solve PDEs
It covers a wide range of solvers employed in CFD, such as Laplaceand Poisson equations, incompressible flow, multiphase flow, and userdefined models
Free, open source, parallel, modular, and flexible
Large, user-driven support community
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 5 / 31
OpenFOAM Parallel Approach
MPI Bulk Synchronous Parallel Model
OpenFOAM uses MPI to provide parallel multi-processorsfunctionality
The mesh and its associated fields are divided into subdomains andallocated to separate processors (domain decomposition)
Domain Decomposition Methods
It provides different utilities for partitioning the domain, such as:
Simple
Hierarchal
METIS
SCOTCH
The communication between the subdomains is implicitly handled withinthe package
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 6 / 31
OpenFOAM Selected Applications
icoFoam The incompressible lam-inar Navier-Stokes equations canbe solved by icoFoam, which ap-plies the PISO algorithm in timestepping loop.
∇ � u = 0
∂u
∂t+∇ �(uu)−∇ �(ν∇u) = −∇p
p: CG
u: Bi-CGSTAB
laplacianFoam The solver is usedto find the solution of the Laplacianequation. The equation contains onevariable, a passive scalar, for instance,a temperature, T .
∂T
∂t−∇2(DT · T) = 0
T : CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 7 / 31
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 8 / 31
Performance Analysis: Platform
A single node consisting of twosockets, each socket is connectedto a GPU device (Tesla C2050)
The socket has 6 dual cores IntelXeon CPU X5670 @ 2.93GHz
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 9 / 31
laplacianFoam: 3D Heat Equation
Heat equation test case solved over a three-dimensional slab geometry usingthe laplacianFoam.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 10 / 31
icoFoam: Lid-driven Cavity flowThe lid-driven cavity flow test case contains the solution of a laminar,isothermal and incompressible flow over a three-dimensional cubic geom-etry. The top boundary of the cube is a wall that moves in the x direction,whereas the rest are static walls.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 11 / 31
Conjugate Gradient 1 iteration
A single iteration of the conjugate gradient solver using a matrix derivedfrom the 3D heat equation.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 12 / 31
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 13 / 31
Proposed Optimizations
Hybrid CG Solver
Hybrid Pipelined CG Solver
Heterogenous Decomposition
“ Future computing architectures will be hybrid systems withparallel-core GPUs working in tandem with multi-core CPUs. ”
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 14 / 31
Hybrid CG
Multicore/MultiGPU
CUDA Kernel is based onCUSP and Thrust calls.
CSR Format in the GPU andSkyline Format in the CPU
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 15 / 31
Hybrid Pipelined CG
Recently introduced byGhysels and Vanroose∗Reduces the globalcommunication to 1 timeinstead of three
Offers better scalability atthe price of extracomputations
r0 = b − Axw0 = Ar0k = 0while ‖r‖2 ≥ τ do
λk = dotc(rk , rk )δ = dotc(wk , rk )q = SpMV (A,wk )if (k >0) then
β = λk / λk−1
α = λk / (δ - (β∗λk )/α)else
β = 0α = λk / δ
end ifz = q + β ∗ zs = w + β ∗ sp = r + β ∗ px = x + α ∗ pr = r − α ∗ sw = w − α ∗ z
end while
∗ P. Ghysels and W. Vanroose, Hiding global synchronization latency in the preconditioned con-
jugate gradient algorithm, Parallel Computing, 2013.Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 16 / 31
Hybrid Pipelined CG
Multicore/MultiGPU
CUDA Kernel is based onCUSP and Thrust calls.
CSR Format in the GPU andSkyline Format in the CPU
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 17 / 31
Heterogenous Decomposition
The main idea is to adequately partition and assign the subdomain tothe processor in proportion to its performance
I more powerful processors get larger subdomains
Combines the performance model and the METIS/SCOTCH library
The load of the processor will be balanced if the number ofcomputations performed by each processor is accordance to its speedon execution the kernel
si (ni ) =ni + 2FiT (ni )
ri (ni ) =si (ni )∑p si (ni )
ni ,new = N ∗ ri (ni )
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 18 / 31
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 19 / 31
Experimental Results: Heterogenous Decomposition
Metis Heterogeneous0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
Decomositon Methods
Ave
rage
Tim
e P
er I
tera
tion
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 20 / 31
Experimental Results: Speedup icoFoam I
N=100k
2 4 8 12 160
0.5
1
1.5
2
2.5
3
3.5
Processors
Sp
eed
up
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 21 / 31
Experimental Results: Speedup icoFoam II
N=1M
2 4 8 12 160
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Processors
Sp
eed
up
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 22 / 31
Experimental Results: Speedup laplacianFoam I
N=100k
2 4 8 12 160
1
2
3
4
5
6
7
8
Processors
Spee
dup
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 23 / 31
Experimental Results: Speedup laplacianFoam II
N=1M
2 4 8 12 160
0.5
1
1.5
2
2.5
3
3.5
4
Processors
Spee
dup
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 24 / 31
Experimental Results: Performance
103
104
105
106
107
1
2
4
8
16
32
64
128
256
512
Problem Size
MF
LO
Ps/
S
Hybrid Pipelined CGHybrid CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 25 / 31
Experimental Results: Roofline
0.125 0.25 0.5 1 2 4 8 16
1
2
4
8
16
32
64
128
256
512
FLOP:BYTE Ratio
Att
aina
ble
Gflo
p/s
DRAM BWPCI BWStream DRAM BW
Peak DP
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 26 / 31
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 27 / 31
Conclusion
Memory bound applications, such as the OpenFOAM selected solvers,can take better advantage of the full hardware potential, which is nowcomplex, hybrid and heterogeneous, if all resources are taken into ac-counts in a holistic approach.
Vector reduction kernel performs n × 1 memory transactions, with nthe vector size and cannot be combined with other operations → lowarithmetic intensity, low memory throughput and poor scalability whenincreasing number of GPU/CPU.
The need for dynamic load balancing scheduling, which adaptively bal-ances the workload during the run-time, by memory-aware work steal-ing.
The experimental results show that the hybrid implementation of bothsolvers significantly outperforms state-of-the-art implementations of awidely used open source package.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 28 / 31
Thank You!
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 29 / 31
Backup Slides.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 30 / 31
PISO Algorithm
Pressure Implicit with Splitting of Operators (PISO)*1 Set the boundary conditions.2 Solve the discretized momentum equation to compute an intermediate
velocity field.3 Compute the mass fluxes at the cells faces.4 Solve the pressure equation.5 Correct the mass fluxes at the cell faces.6 Correct the velocities on the basis of the new pressure field.7 Update the boundary conditions.8 Repeat from 3 for the prescribed number of times.9 Increase the time step and repeat from 1.
*J. H. Ferziger, M. Peric, Computational Methods for Fluid Dynamics, Springer, 3rd Ed., 2001. H. Jasak, Error Analysis
and Estimation for the Finite Volume Method with Applications to Fluid Flows, Ph.D. Thesis, Imperial College, London, 1996.
http://openfoamwiki.net
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 31 / 31