large scale parallel reservoir simulations on a … · large scale parallel reservoir simulations...

55
Large Scale Parallel Reservoir Large Scale Parallel Reservoir Simulations on a Linux PC Simulations on a Linux PC - - Cluster Cluster Walid A. Habiballah Walid A. Habiballah M. Ehtesham Hayder M. Ehtesham Hayder Saudi Aramco Saudi Aramco Fourth LCI International Conference on Linux Clusters Fourth LCI International Conference on Linux Clusters ClusterWorld Conference and Expo ClusterWorld Conference and Expo San Jose, June 23 San Jose, June 23 - - 26, 2003 26, 2003

Upload: hakiet

Post on 10-Aug-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Large Scale Parallel Reservoir Large Scale Parallel Reservoir Simulations on a Linux PCSimulations on a Linux PC--ClusterCluster

Walid A. HabiballahWalid A. HabiballahM. Ehtesham HayderM. Ehtesham Hayder

Saudi AramcoSaudi Aramco

Fourth LCI International Conference on Linux ClustersFourth LCI International Conference on Linux ClustersClusterWorld Conference and ExpoClusterWorld Conference and Expo

San Jose, June 23San Jose, June 23--26, 200326, 2003

OutlineOutline

•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco

•• Formulation of the simulator (POWERS)Formulation of the simulator (POWERS)

•• PC PC -- ClustersClusters

•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster

•• Results Results

•• ConclusionsConclusions

Seismic Interpretation

Geological Model

Core Analysis & well logs

Simulation Model

Well Production& Injection Data

Demands for large scale reservoir simulations at Demands for large scale reservoir simulations at

Saudi Aramco is rapidly increasingSaudi Aramco is rapidly increasing

•• to reduce simulation turn around timesto reduce simulation turn around times•• to resolve heterogeneity in reservoirsto resolve heterogeneity in reservoirs•• to couple reservoir and facility simulationsto couple reservoir and facility simulations•• for well location and trajectory planningfor well location and trajectory planning

Other potential benefits of large scale reservoir Other potential benefits of large scale reservoir simulations to Saudi Aramcosimulations to Saudi Aramco

•• Longer time simulationsLonger time simulations•• Possibility of automatic or semiPossibility of automatic or semi--automatic history matchingautomatic history matching•• Computational steering Computational steering •• Exploitations of technological advances in computing Exploitations of technological advances in computing •• ……………………

Growth of POWERS million+ cell simulation modelsGrowth of POWERS million+ cell simulation models

411

19

3944

51

33

010203040506070

2000 2001 2002 2003 2004 2005 2006

Num

ber o

f Mod

els

Projected computing capacity required for Projected computing capacity required for reservoir simulations reservoir simulations

1 2 3

1921

23

15

0

5

10

15

20

25

30

2000 2001 2002 2003 2004 2005 2006

Nor

mal

ized

Com

pute

Cap

acity

13

5

1316

18

21

0

5

10

15

20

2 5

2000 2001 2002 2003 2004 2005 2 006

Projected storage capacity required for reservoir Simulations

Nor

mal

ized

Sto

rage

Cap

acity

OutlineOutline•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco

•• Formulation of the simulator (POWERS)Formulation of the simulator (POWERS)

•• PC PC -- ClustersClusters

•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster

•• Results Results

•• ConclusionsConclusions

PParallelarallel OOilil WWatatERER SSimulatorimulator

•• It is a parallel reservoir simulatorIt is a parallel reservoir simulator•• Functionality includes:Functionality includes:

–– BlackBlack--OilOil–– CompositionalCompositional–– Dual Porosity Dual PermeabilityDual Porosity Dual Permeability–– Local Grid RefinementLocal Grid Refinement–– Comprehensive Well ManagementsComprehensive Well Managements

qUt

Sa

a a=∇+

∂•

)(φρ

)( HgPKSK

U aaa

a

aaa ∇−∇−= ρρ

µ

1=∑ aS

Governing Equations:Governing Equations:

Numerical algorithmNumerical algorithm•• Implicit FormulationImplicit Formulation•• NewtonNewton--Raphson for nonlinearityRaphson for nonlinearity•• Generalized Conjugate Residual to solve linearized equationsGeneralized Conjugate Residual to solve linearized equations•• RedRed--Black orderingBlack ordering

OutlineOutline•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco

•• Formulation of the simulatorFormulation of the simulator

•• PC PC -- ClustersClusters

•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster

•• Results Results

•• ConclusionsConclusions

High-Speed Switch

PCPC--ClusterCluster

Performance of different processors using the Performance of different processors using the SPECfp2000 benchmarkSPECfp2000 benchmark

322

532

843 872

0

200

400

600

800

1000

Power 3-375M Hz IBM -NH-2

SGI M IPS-Pro500M Hz-R14K

Power 4-1 GHzIBM -Reggata

Pent ium IV2.4GHz

SPEC

fp20

00 N

umbe

rs

Bandwidth and Latency of various switches in Bandwidth and Latency of various switches in PCPC--Clusters configurationsClusters configurations

5400Quadrics

7421Myrinet

26-12128Gigabit

15012Fast Ethernet

Latency(microsecond)

Bandwidth(Mbytes/sec)Switch

0

20

40

60

80

1997 1998 1999 2000 2001 2002

Num

ber o

f Ins

talla

tions Intel

HPAlphaAMD

Number of installation of NOW Clusters in the Top500 supercomputers list

Status of PCStatus of PC--Cluster Computations at Saudi AramcoCluster Computations at Saudi Aramco

•• Seismic processing activities in Saudi Aramco are taking Seismic processing activities in Saudi Aramco are taking full advantages of PCfull advantages of PC--cluster technology and abandoned cluster technology and abandoned other systems.other systems.

•• Reservoir simulation models are now being migrated to PCReservoir simulation models are now being migrated to PC--Cluster platform (this studyCluster platform (this study).).

Quadrics Cluster (Cluster_QQuadrics Cluster (Cluster_Q))

128 GBTotal RAM2.0 GHzClock64 (Xeon)Processors (CPUs)32Nodes

Configuration of the Quadrics Cluster

Quadrics Cluster (Cluster_QQuadrics Cluster (Cluster_Q))

RMSJob submission

MPICH - ElanCommunication

Vampir, VampirTrace, VtuneOptimization Tools

TotalviewDebugger

Intel ifc (v.6), iccCompiler

Redhat Linux 7.2OS

Software on Quadrics Cluster

Q u a d r ic sN e t w o r k

E th e r n e tN e t w o r k

S er ia l C o n c e n tra to r

W o rk sta ti o n

W o rk sta tio n

W o rk sta tio n

Q u a d ri csF a s t E th er n e t

S er ia l

Q u a d r ic sN e t w o r k

E th e r n e tN e t w o r k

S er ia l C o n c e n tra to r

W o rk sta ti o n

W o rk sta tio n

W o rk sta tio n

Q u a d ri csF a s t E th er n e t

S er ia l

Q u a d ri csF a s t E th er n e t

S er ia l

Quadrics Cluster (Cluster_Q)Quadrics Cluster (Cluster_Q)

Myrinet Cluster (Cluster_M)Myrinet Cluster (Cluster_M)

20 TBExternal Storage

1024 GB (1 TB) Total RAM

2.4 GHzClock

512 (Xeon)Processors (CPUs)

256Nodes

2 Sub-Clusters

Configuration of Myrinet Cluster

Myrinet Cluster (Cluster_MMyrinet Cluster (Cluster_M))

PBSJob submission

MPICH - MyrinetCommunication

Vampir, VampirTrace, VtuneOptimization Tools

TotalviewDebugger

Intel ifc (v.7), iccCompiler

Redhat Linux 7.3OS

Software on Myrinet Cluster

Master

Management

10 TB10 TB

MyrinetSwitch

to the router

32 nodes 32 nodes 32 nodes 32 nodes

MyrinetNetwork

Ethernet Network

Master

Management

10 TB10 TB

MyrinetSwitch

to the router

32 nodes 32 nodes 32 nodes 32 nodesMaster

Management

10 TB10 TB

MyrinetSwitch

to the router

32 nodes 32 nodes 32 nodes 32 nodes

MyrinetNetwork

Ethernet Network

Myrinet Cluster (Cluster_M)Myrinet Cluster (Cluster_M)

OutlineOutline•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco

•• Formulation of the simulatorFormulation of the simulator

•• PC PC -- ClustersClusters

•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster

•• Results Results

•• ConclusionsConclusions

Parallel Implementations of POWERSParallel Implementations of POWERS

•• POWERS was originally written in Connection Machine Fortran. POWERS was originally written in Connection Machine Fortran.

•• Code was later ported to IBM SP using OpenMP and MPI calls.Code was later ported to IBM SP using OpenMP and MPI calls.

•• No modification of the numerical algorithm was made in this or iNo modification of the numerical algorithm was made in this or in n the previous port. the previous port.

Summary of PCSummary of PC--Cluster porting activitiesCluster porting activities

•• Recompiled the code on cluster using the IntelRecompiled the code on cluster using the Intel--ifc compilerifc compiler•• Expanded MPI parallelization into a second dimensionExpanded MPI parallelization into a second dimension•• Benchmarked interconnect switches (MericomBenchmarked interconnect switches (Mericom--Myrinet and Myrinet and

QuadricsQuadrics--Elan) using code kernels and the full codeElan) using code kernels and the full code•• Revised MPI communication algorithms in the codeRevised MPI communication algorithms in the code•• Tuned performanceTuned performance

Node 0

Node n

Node 1

J-Dimension

MPI

CPU 0 CPU mThreadsI-D

imen

sion

Original Implementation Original Implementation

IBMIBM--NH2NH2 PCPC--ClusterCluster

Porting from IBM NH2 to PC Porting from IBM NH2 to PC -- ClusterCluster

J-Dimension I-Dimension

I-DimensionMPI-Communication

CPU 0CPU 1

Node 1

CPU 0

CPU 1

Node m

CPU 0

CPU 1

Node 0

J-Dimension

Grid Partitioning

OpenMPThreads

New ImplementationNew Implementation

OutlineOutline•• Reservoir simulationsReservoir simulations

•• PC PC -- ClustersClusters

•• Formulation of the simulatorFormulation of the simulator

•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster

•• ResultsResults

•• ConclusionsConclusions

242000.5Model IV

624,0000.2Model III

502,9609.6Model II

31443.7Model I

Years Simulated

Numberof wells

Grid Size(million cells)

Model

Test CasesTest Cases

Model I Execution time on PC Clusters and IBM_NH2Model I Execution time on PC Clusters and IBM_NH2

0

3000

6000

9000

12000

16 CPU 32 CPU

Cluster_QCluster_MIBM NH2

Tim

e(s

ec)

A comparison of platformsA comparison of platforms

0

0.5

1

1.5Ti

me

(hr)

IBM NH2 IBMRegatta

Cluster_Q

Execution time of Model I on PC ClustersExecution time of Model I on PC Clusters

1000

2000

3000

4000

16 32

Number of Processors

Tim

e (s

ec)

Cluster_Q(v.6)Cluster_M(v.6)Cluster_M(v.7)

Model I Simulation timeModel I Simulation time

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

16 32 48 64 80 96 120Number of Processors

IBM NH2 Cluster_M

Tim

e(s

ec)

Scalability of Model IScalability of Model I

16

32

48

64

80

96

16 32 48 64 80 96

Number of Processors

Spe

ed u

p

IdealNH2 (total)NH2 (core)Cluster (total)Cluster (core)

Simulations of Model IISimulations of Model II

0

5

10

15

20

25

30

IBM NH2 PC Cluster96 Processors

Tim

e (h

r)

Simulations of Model II on Cluster _MSimulations of Model II on Cluster _M

0

4

8

12

16

64 96 125

Number of Processors

Tim

e (h

r)

Simulations of Model II on Cluster _MSimulations of Model II on Cluster _M

64

96

128

64 96 128Number of Processors

IdealActualComputation

Spe

ed u

p

Timing of Code Sections in Model ITiming of Code Sections in Model I

0

1000

2000

3000

16 32 48 64 96 120

Number of Processors

Tim

e (s

ec)

CommSynchCore Total

Model III execution timeModel III execution time

4,6753,7205,230---50

7,7004,8605,7008,00020

13,5007,0907,3509,90010

98,35045,83045,83056,1001

IBM NH2(sec.)

Cluster_M(opt)(sec.)

Cluster_M(orig)(sec.)

Cluster_Q(sec.)

CPU

Effect of optimized communication algorithm for Model I Effect of optimized communication algorithm for Model I

SimulationSimulation

0.7

0.75

0.8

0.85

0.9

0.95

1

16 32 64 96 120

Number of Processors

T_op

t / T

_orig

Total Time

Core Time

Communication Overhead in Model III SimulationCommunication Overhead in Model III Simulation

0

500

1000

1500

2000

2500

3000

3500

10 20 50

Number of Processors

Tim

e (s

ec)

Reg Comm (orig)Reg Comm (opt)Irreg Comm (orig)Irreg Comm (opt)

Timing with Different Decompositions in Models I and IVTiming with Different Decompositions in Models I and IV

0

1000

2000

3000

32x1 16x2 8x4 4x8 2x16 1x32

Blocking

Tim

e (s

ec)

Model I total Model I coreModel IV totalModel IV core

OutlineOutline•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco

•• PC PC -- ClustersClusters

•• Formulation of the simulatorFormulation of the simulator

•• Implementation of the Simulator on PCImplementation of the Simulator on PC--ClusterCluster

•• Results Results

•• ConclusionsConclusions

ConclusionsConclusions

Based on our observations in this work, we conclude Based on our observations in this work, we conclude the followingsthe followings::

•• PCPC--Clusters are highly attractive option for parallel reservoir Clusters are highly attractive option for parallel reservoir simulations in terms of performance and cost. simulations in terms of performance and cost.

•• Simulations of massive reservoir models (in the order of 10 Simulations of massive reservoir models (in the order of 10 million cells) are possible at affordable computing cost.million cells) are possible at affordable computing cost.

•• Serial computation, synchronization and I/O issues become Serial computation, synchronization and I/O issues become significant at high number of processorssignificant at high number of processors..

ConclusionsConclusions•• Commodity switches can be effectively used in high Commodity switches can be effectively used in high

performance computing platform for reservoir simulation.performance computing platform for reservoir simulation.

•• Both hardware and software tools such as compilers are Both hardware and software tools such as compilers are constantly improving which in turn will reduce execution constantly improving which in turn will reduce execution times and computing costs on PCtimes and computing costs on PC--clusters.clusters.

•• The computing hardware is continuously changing and we will The computing hardware is continuously changing and we will follow up and monitor to take advantage of new and emerging follow up and monitor to take advantage of new and emerging technologies.technologies.