chemistry and hpc at idris - réseau français de chimie ... · institut du developpement et des...

52
www.idris.fr Institut du d ´ eveloppement et des ressources en informatique scientifique Chemistry and HPC at IDRIS Thibaut Very - Cours RFCT - 25/01/2017

Upload: lekhue

Post on 13-Sep-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

www.idris.fr Institut du developpement et des ressources en informatique scientifique

Chemistry and HPC at IDRIS

Thibaut Very - Cours RFCT - 25/01/2017 Thibaut Very - Cours RFCT - 25/01/2017

Outline -

Goals of this presentation• Introduction to

− European High Performance Computing ecosystem− French HPC ecosystem− GENCI

X How to compute at HPC centers− Focus on IDRIS

X OrganizationX MachinesX User support

• Technical points− A few words on hardware− Why is parallelism important

• Application to chemistry− A few examples of problems you can meet− How to solve them

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 2 / 38

-

How is HPC organized in Europe/France?

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 3 / 38

PRACE - Description

Partnership for Advanced Computing in Europe• Created in 2008• 25 member countries• Different tiers• 7 tier-0 machines

− Marconi, Lenovo, 13 PFlop/s, CINECA− Piz Daint, Cray XC30, 7,8 Pflop/s, CSCS− Hazel Hen, Cray XC40, 7,4 Pflop/s, HLRS− JuQUEEN, IBM Blue Gene/Q, 5,9 PFlop/s, FZJ− SuperMUC, IBM iDataPlex, 3,2 PFlop/s, LRZ− Curie, Atos/Bull, 2 PFlop/s, CEA− MareNostrum III, IBM iDataPlex, 1,1 Pflop/s, BSC

• Advanced Training courses(http://www.training.prace-ri.eu)

• In France: GENCI Tier-2

Tier-1

Tier-0European centers

National centers

Regional/universities centers

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 4 / 38

HPC Organisation - GENCI

Grand Equipement National de CalculIntensif

• Created in 2007• Several share holders

− Ministry of education and research (49%)− CNRS (20%)− CEA (20%)− Universities (10%)− INRIA (1%)

• 3 missions− National equipment− Promote HPC on European scale− Promote numerical simulation for academia

and industryX Tier2 centers (Equip@Meso: Mesocenters)X Initiative HPC-PME

• 3 HPC centers− CINES− IDRIS− TGCC

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 5 / 38

GENCI - Thematic committees

Research divided in 11commitees

• CT1: Environmental sciences• CT2a: Non-reactive fluid flows• CT2b: Reactive or multiphase fluid

flows• CT3: Biology and biomedical sciences• CT4: Astrophysics and geophysics• CT5: Theoretical and plasma physics• CT6: Computer science, algorithm and

mathematics• CT7: Molecular dynamics in biology• CT8: Quantum chemistry and

molecular modeling• CT9: Physics, chemistry and material

properties• CT10: Transversal applications

Ada23.9

13.4

6.25.0 10.8

5.30.7

4.0

14.0

11.85.0

Turing3

20.1

35.3 1

8.7

25.6

0.3

2.70.92.8

Environmental sciences

Non-reactive fluid flows

Reactive or multiphase fluid flows

Biology and biomedical science

Astrophysics and geophysics

Theoretical and plasma physics

Computer science, algorithm and mathematics

Molecular dynamics with biological applications

Quantum chemistry and molecular modeling

Physics, chemistry and material properties

Transverse applications

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 6 / 38

GENCI - Thematic committees

Research divided in 11commitees

• CT1: Environmental sciences• CT2a: Non-reactive fluid flows• CT2b: Reactive or multiphase fluid

flows• CT3: Biology and biomedical sciences• CT4: Astrophysics and geophysics• CT5: Theoretical and plasma physics• CT6: Computer science, algorithm and

mathematics• CT7: Molecular dynamics in biology• CT8: Quantum chemistry and

molecular modeling• CT9: Physics, chemistry and material

properties• CT10: Transversal applications

Ada23.9

13.4

6.25.0 10.8

5.30.7

4.0

14.0

11.85.0

Turing3

20.1

35.3 1

8.7

25.6

0.3

2.70.92.8

Environmental sciences

Non-reactive fluid flows

Reactive or multiphase fluid flows

Biology and biomedical science

Astrophysics and geophysics

Theoretical and plasma physics

Computer science, algorithm and mathematics

Molecular dynamics with biological applications

Quantum chemistry and molecular modeling

Physics, chemistry and material properties

Transverse applications

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 6 / 38

GENCI - DARI

How to compute at HPC centers• www.edari.fr• ' 1.5 billion hours

(Real cost: ' 0.03C/hour)• 2 sessions a year (1-year projects)• Preparatory access (a few tens thousand

hours, depending on the machine)

2 projects callsper year

Scientific evaluationby the 11 committees

Evaluation committeeAttribution proposal

Allocation

Attribution committeeArbitration

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 7 / 38

GENCI - Project

Admininistrative part• Project title• Thematic Commitee• Laboratory info.• Project manager info.• Technical manager• Project support (ANR,

Tier2 center, industry,. . . )

Technical part• Software description

and performance• Type of machine

architecture needed• How much data space?• Experience of the team

Scientific part• Project description

− Abstract− State of the art− Research plan− Methods

• Justification of theneeds

• Bibliography

2 projects callsper year

Scientific evaluationby the 11 committees

Evaluation committeeAttribution proposal

Allocation

Attribution committeeArbitration

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 8 / 38

GENCI - Project

Admininistrative part• Project title• Thematic Commitee• Laboratory info.• Project manager info.• Technical manager• Project support (ANR,

Tier2 center, industry,. . . )

Technical part• Software description

and performance• Type of machine

architecture needed• How much data space?• Experience of the team

Scientific part• Project description

− Abstract− State of the art− Research plan− Methods

• Justification of theneeds

• Bibliography

2 projects callsper year

Scientific evaluationby the 11 committees

Evaluation committeeAttribution proposal

Allocation

Attribution committeeArbitration

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 8 / 38

GENCI - Project

Admininistrative part• Project title• Thematic Commitee• Laboratory info.• Project manager info.• Technical manager• Project support (ANR,

Tier2 center, industry,. . . )

Technical part• Software description

and performance• Type of machine

architecture needed• How much data space?• Experience of the team

Scientific part• Project description

− Abstract− State of the art− Research plan− Methods

• Justification of theneeds

• Bibliography

2 projects callsper year

Scientific evaluationby the 11 committees

Evaluation committeeAttribution proposal

Allocation

Attribution committeeArbitration

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 8 / 38

IDRIS - Presentation

Institut du Developpement et desRessources en InformatiqueScientifique

• Institute for the development and resourcesin scientific computing

• CNRS UPS851 founded end of 1993• Located in Orsay ('20 km South-east of

Paris on Paris-Saclay campus)• Building of 1969 (previously CIRCE)• Simultaneously:

− Computing center equiped withsupercomputers among the most powerful

− Excellence center for HPC

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 9 / 38

IDRIS - Teams

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 10 / 38

IDRIS - Machines configuration

10 GB/s

5 PB

2 PB

ADA: IBM x3750m4- 10,624 cores (2.7 GHz)- 332 nodes- RAM: 49 TB- Rpeak: 233 TFlop/s

AdaPP: IBM x3850x5- Pre/post processing- 128 cores (2.67 GHz)- 4 nodes- RAM: 4 TB

Turing: BlueGene/Q- 98,304 cores (1.6 GHz)- 6144 nodes- RAM: 96 TB- Rpeak: 1,258 TFlop/s

Ergon: IBM - Data preservation- 13 nodes- RAM: 4 TB

Robot Oracle: STK SL8500 - Data preservation- 6300 magnetic bands- 10 PB

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 11 / 38

IDRIS - User Support

User Support• Important resources since the beginning of

IDRIS• ' 15 engineers• Hotline

− By phone and e-mail− Monday-Friday, from 9 a.m to 6 p.m− Follow-up of problems

• Advanced support for projects− Porting, optimization (CPU, communications,

memory, I/O)− From a few days to a few months

• Documentation on www.idris.fr• Training courses

− Languages (C,Fortran)− Parallelism (OpenMP, MPI, hybrid)− Free for academics− Industry welcome, but paying (CNRS

Formation Entreprises)

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 12 / 38

IDRIS - Trainings

2017 TrainingsFortran Basics 24.01.2017 3 days ConfirmedMPI 30.01.2017 4 days ConfirmedOpenMP 07.03.2017 3 days AnnouncedFortran Advanced 14.03.2017 4 days AnnouncedFortran Basics 03.05.2017 3 days AnnouncedHybrid MPI/OpenMP 10.05.2017 2 days AnnouncedC 29.05.2017 5 days AnnouncedMPI / OpenMP 12.06.2017 5 days AnnouncedFortran Avanced 20.06.2017 4 days AnnouncedMPI 02.10.2017 4 days AnnouncedFortran Basics 10.10.2017 3 days AnnouncedOpenMP 17.10.2017 3 days AnnouncedHPC debugging 09.11.2017 2 days AnnouncedFortran Avanced 14.11.2017 4 days AnnouncedBlue Gene/Q porting and optimization 23.11.2017 2 days AnnouncedHybrid MPI/OpenMP 05.12.2017 2 days AnnouncedFortran 2003 12.12.2017 3 days Announced

To register for a traininghttps://cours.idris.fr/

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 13 / 38

-

Some technical aspects that can help tounderstand how to enhance performance.

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 14 / 38

Software - Installation

Different environments• Standard workstation

− Usually given by the developpers• Supercomputer

− Non-standard path for libraries− Shared folders− To be tested for performance

Test case: Quantum Espresso• Zeolite• PBE• Γ point• 10 steps

Compiling

Compiler options -O0 -O2 -O2 -xAVX -mkl -O3 -xAVX -parallel -mkl

Elapsed time 3h 3min 20min 28s 6min 6s 9min

At IDRIS• User can compile/use their own versions

− You can ask for help• System wide installations are done by support

− But only official versions

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 15 / 38

Software - Modules

List of modules available at IDRIS

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 16 / 38

Job Scheduler - Options

Schedulers• LoadLeveler (At IDRIS)• SLURM• PBS• Univa Grid Engine• . . .

Usage• Allows sheduling numerous jobs• Allocate resources required

− Number of nodes, cores− Memory− Time

• Manages priorities, dependencies, etc. . .• Specific syntax (usually shell comments)• Commands to be run

Example: Ada queue 01/19• 330 jobs in queue• 91 waiting• 153 running• 86 held• 208 users

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 17 / 38

Job Scheduler - Options

Schedulers• LoadLeveler (At IDRIS)• SLURM• PBS• Univa Grid Engine• . . .

Usage• Allows sheduling numerous jobs• Allocate resources required

− Number of nodes, cores− Memory− Time

• Manages priorities, dependencies, etc. . .• Specific syntax (usually shell comments)• Commands to be run

Example: Ada queue 01/19• 330 jobs in queue• 91 waiting• 153 running• 86 held• 208 users

Classes

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 17 / 38

Job Scheduler - Options

Schedulers• LoadLeveler (At IDRIS)• SLURM• PBS• Univa Grid Engine• . . .

Usage• Allows sheduling numerous jobs• Allocate resources required

− Number of nodes, cores− Memory− Time

• Manages priorities, dependencies, etc. . .• Specific syntax (usually shell comments)• Commands to be run

Example: Ada queue 01/19• 330 jobs in queue• 91 waiting• 153 running• 86 held• 208 users

Classes

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 17 / 38

Job Scheduler - Options

Schedulers• LoadLeveler (At IDRIS)• SLURM• PBS• Univa Grid Engine• . . .

Usage• Allows sheduling numerous jobs• Allocate resources required

− Number of nodes, cores− Memory− Time

• Manages priorities, dependencies, etc. . .• Specific syntax (usually shell comments)• Commands to be run

Example: Ada queue 01/19• 330 jobs in queue• 91 waiting• 153 running• 86 held• 208 users

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 17 / 38

Memory - Management

Different kinds of memory• Cache

− Close to core− Access ' 1ns (L1)− Small (few kB → few MB)

• RAM− Access still fast (' 100 ns)− Usually a few GB/core

• SSD Disks− Access ' 10µs− Some hundreds of GB

• HD Disks− Access ' 1ms− Some TB

Why is it important?• Strongly affects code perfomance

− Try to keep everything in RAM− OS uses RAM for disk cache!− You should tune memory settings!

• Users should set up their job having that in mind• Some limitations on HPC centers (Size and number of

files)

Examples• VASP, 289 atoms, 1024 processes, 2 geometry steps

− With I/O (' 1GB): 662s− Without I/O: 338s

• Crystal, 289 atoms, 1024 processes, 1 SCF− PCrystal: 12,300 files ; crashed after 420s− PCrystal: 1040 files ; 1239s− PCrystal optimized: 1040 files ; 1117s

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 18 / 38

-

An important technical aspect: Parallelism

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 19 / 38

Parallelization - Useful vocabulary

• Speedup: S(n) = TseqTpara(n)

• Strong scaling: Scaling with fixed problemsize

• Weak scaling: Scaling with problem sizelinear to number of tasks

Strong

Weak

#cores

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 20 / 38

Parallelization - Laws

Amdahl’s law• Stated in 1967• Given a fixed problem size (Strong scaling)• What is the evolution of speedup w.r.t.

cores?• A part α of the code is serial• S(n) = 1

α+ 1−αn

• Perfect scaling: S(n) = n

limn→∞

S(n) =1

α

0

10

20

30

40

50

60

70

80

90

100

1 4 16 64 256 1024 4096 16384

Spe

ed-U

p

Number of cores

a=0.01a=0.1a=0.5

α 1-α1-α

α

1 core

n cores (1-α)/n

Quite pessimistic. Real applications usually not fixed size

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 21 / 38

Parallelization - Laws

Amdahl’s law• Stated in 1967• Given a fixed problem size (Strong scaling)• What is the evolution of speedup w.r.t.

cores?• A part α of the code is serial• S(n) = 1

α+ 1−αn

• Perfect scaling: S(n) = n

limn→∞

S(n) =1

α

1

4

16

64

256

1024

4096

16384

65536

1 4 16 64 256 1024 4096 16384

Spe

ed-U

p

Number of cores

a=0.01a=0.1a=0.5Ideal

α 1-α1-α

α

1 core

n cores (1-α)/n

Quite pessimistic. Real applications usually not fixed size

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 21 / 38

Parallelization - Laws

Amdahl’s law• Stated in 1967• Given a fixed problem size (Strong scaling)• What is the evolution of speedup w.r.t.

cores?• A part α of the code is serial• S(n) = 1

α+ 1−αn

• Perfect scaling: S(n) = n

limn→∞

S(n) =1

α

1

4

16

64

256

1024

4096

16384

65536

1 4 16 64 256 1024 4096 16384

Spe

ed-U

p

Number of cores

a=0.01a=0.1a=0.5Ideal

α 1-α1-α

α

1 core

n cores (1-α)/n

Quite pessimistic. Real applications usually not fixed size

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 21 / 38

Parallelization - Laws

Gustafson-Barsis Law• Stated in 1988• Problem size increases with number of

cores (Weak scaling)• A part α of the code is serial• S(n) = n− α(n− 1)

• Difficult to achieve in real applications• Optimistic

20

40

60

80

100

120

20 40 60 80 100 120

Spe

ed-U

p

Number of cores

a=0.1a=0.2a=0.5a=0.9

α n*(1-α)

α

1 core

n cores 1-α

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 22 / 38

Parallelization - Laws

Gustafson-Barsis Law• Stated in 1988• Problem size increases with number of

cores (Weak scaling)• A part α of the code is serial• S(n) = n− α(n− 1)

• Difficult to achieve in real applications• Optimistic

20

40

60

80

100

120

20 40 60 80 100 120

Spe

ed-U

p

Number of cores

a=0.1a=0.2a=0.5a=0.9

α n*(1-α)

α

1 core

n cores 1-α

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 22 / 38

Parallelization - Shared Memory Model

Shared memory• 1 Process creates several threads• Share work between threads• All threads have access to the

same memory space• OpenMP: Example

core-Hamiltonian

core core

core core

Memory

1 process

6 5 4 3 2 1 0

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 23 / 38

Parallelization - Shared Memory Model

Shared memory• 1 Process creates several threads• Share work between threads• All threads have access to the

same memory space• OpenMP: Example

core-Hamiltonian

double charge(int);

double distance(int ,int);

double KE(int);

void core(int Nelectrons ,int Natoms ,

double* Hc)

{

for (int i=0;i<Nelectrons ;++i)

{

Hc[i] = -KE(i);

for (int a=0;a<Natoms ;++a)

{

Hc[i] -= charge(a)/ distance(i,a);

}

}

}

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 23 / 38

Parallelization - Shared Memory Model

Shared memory• 1 Process creates several threads• Share work between threads• All threads have access to the

same memory space• OpenMP: Example

core-Hamiltonian

double charge(int);

double distance(int ,int);

double KE(int);

void core(int Nelectrons ,int Natoms ,

double* Hc)

{

#pragma omp parallel for

for (int i=0;i<Nelectrons ;++i)

{

Hc[i] = -KE(i);

for (int a=0;a<Natoms ;++a)

{

Hc[i] -= charge(a)/ distance(i,a);

}

}

}

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 23 / 38

Parallelization - Shared Memory Model

Shared memory• 1 Process creates several threads• Share work between threads• All threads have access to the

same memory space• OpenMP: Example

core-Hamiltonian

double charge(int);

double distance(int ,int);

double KE(int);

void core(int Nelectrons ,int Natoms ,

double* Hc)

{

#pragma omp parallel for

for (int i=0;i<Nelectrons ;++i)

{

Hc[i] = -KE(i);

for (int a=0;a<Natoms ;++a)

{

Hc[i] -= charge(a)/ distance(i,a);

}

}

}Pros• Might be easy to parallelize with• Even automatically by compilers

Cons• Intranode• Difficult to debug

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 23 / 38

Parallelization - Distributed Memory model

Distributed memory• Several independant processes

created• Share work between processes• Each process have directly

access to its memory• Interprocess communication

through network• MPI: Message passing interface

core

Memory

core

Memory

core

Memory

core

Memory

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 24 / 38

Parallelization - Distributed Memory model

Distributed memory• Several independant processes

created• Share work between processes• Each process have directly

access to its memory• Interprocess communication

through network• MPI: Message passing interface

if (taskid > MASTER)

{

mtype = FROM_MASTER;

MPI_Recv (&offset , 1, MPI_INT , MASTER , mtype , MPI_COMM_WORLD ,

&status );

MPI_Recv (&rows , 1, MPI_INT , MASTER , mtype , MPI_COMM_WORLD ,

&status );

MPI_Recv (&a, rows*NCA , MPI_DOUBLE , MASTER , mtype , MPI_COMM_WORLD ,

&status );

MPI_Recv (&b, NCA*NCB , MPI_DOUBLE , MASTER , mtype , MPI_COMM_WORLD ,

&status );

for (k=0; k<NCB; k++)

for (i=0; i<rows; i++)

{

c[i][k] = 0.0;

for (j=0; j<NCA; j++)

c[i][k] = c[i][k] + a[i][j] * b[j][k];

}

mtype = FROM_WORKER;

MPI_Send (&offset , 1, MPI_INT , MASTER , mtype , MPI_COMM_WORLD );

MPI_Send (&rows , 1, MPI_INT , MASTER , mtype , MPI_COMM_WORLD );

MPI_Send (&c, rows*NCB , MPI_DOUBLE , MASTER , mtype , MPI_COMM_WORLD );

}

Pros• Intra or Inter node

Cons• Needs explicit programming• Needs efficient network

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 24 / 38

Parallelization - Hydrid OpenMP/MPI

6 5 4 3 2 1 0

6 5 4 3 2 1 0

6 5 4 3 2 1 0

6 5 4 3 2 1 0

core core

core core

Memory

core core

core core

Memory

core core

core core

Memory

core core

core core

Memory

Hybrid programming• Adapted to modern architecture• Might increase code performance

− 2 to 5 fold memory footprint reduction− Larger system for the same cost

• Coding tricky

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 25 / 38

-

Application to chemistry related softwareproblems

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 26 / 38

Parallelism - Chemistry

What can you parallelize?

System splitting Wave function, e− density Mathematics functions

• Multiple images • Basis set coefficients • Functions (FFT,. . . )• NEB • Bands • Eigen solvers• Numerical frequencies • Grid • Matrices handling

Parallelization Complexity

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 27 / 38

Parallelism - Grand Challenges 2012

Free-Energy Surface• Forcefield• Aminoacids through

membrane• 15,000 atoms• 61 replicas• Gromacs• 2µs/day

Bonhenry, Dehez, Tarek (Lorraine university)

Ionic liquid• DFT• Structure and properties• ' 900 atoms• 20 ps• CP2K/CPMD

Sieffert (Grenoble university)

Li-ion anode• BigDFT• Up to 625 atoms• Surface boundary

conditions

S. Krishnan, D. Caliste, T. Deutsch, L. Genovese, P.Pochet (Grenoble University, CEA)

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 28 / 38

Parallelization - Summary

What you can have• Reduce time to solution for a fixed size

problem• More memory

− Increase size of systems− New kinds of computations

Limitations• Extensibility not infinite• Parallelism overhead• Balance of workload between threads,

processes• Overall time consumption is higher• Code adaptation

Parallelism is critical• Do not expect improvement thanks to hardware• Fits the architectures of machines• Larger simulations

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 29 / 38

-

Chemistry at IDRIS

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 30 / 38

Software - Installed at IDRIS

On wikipedia: 125 different pieces of software

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 31 / 38

Chemistry - Installed At IDRIS

Name license Access Shared Distributed Accelerators

ADF Site (6400C/year) Free Yes Yes NoGaussian Site (28750C/(20 years)) Ask Yes Yes YesVASP Support (1000C/group) Req. No Yes YesMolpro Site (10,000 £) Free No Yes NoMolcas Site (50,000 SEK) Free No Yes NoABinit GPL Free Yes Yes YesCPMD Site Free Yes Yes YesCrystal Site (6800C/an) Ask No Yes NoSIESTA Site Ask No Yes NoQuantum espresso GPL Free Yes Yes YesNWChem ECL Free Yes Yes NoCP2K GPL Free Yes Yes YesBigDFT GPL Free Yes Yes YesGromacs GPL Free Yes Yes YesNAMD GPL Free Yes Yes YesLAMMPS GPL Free No Yes Yes

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 32 / 38

Examples - Turbomole

Details• Turbomole (local workstation+curie)

− 3 Parallel modesX MPIX SMPX SMP(with MPI)

• 190 atoms• Geometry optimization

From workstation to computing center• Large decrease in performance for

SMP-MPI− Only 2 cores working on Curie

• Vanished after tuning mpi kickstarter options(core binding -vapi -affmanual=“0x0,. . . ”)

3 modes to rule them all: Parallelefficiency

• Most efficient: SMP-MPI• MPI: 1.2× slower• SMP: 2.5× slower

Test your system/code on the machines

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 33 / 38

Examples - Turbomole

Details• Turbomole (local workstation+curie)

− 3 Parallel modesX MPIX SMPX SMP(with MPI)

• 190 atoms• Geometry optimization

From workstation to computing center• Large decrease in performance for

SMP-MPI− Only 2 cores working on Curie

• Vanished after tuning mpi kickstarter options(core binding -vapi -affmanual=“0x0,. . . ”)

3 modes to rule them all: Parallelefficiency

• Most efficient: SMP-MPI• MPI: 1.2× slower• SMP: 2.5× slower

Test your system/code on the machines

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 33 / 38

Examples - Crystal

Details• MPPCrystal• 298 atomes• Γ point

Core workload increases. Stated in output file

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 34 / 38

Examples - Quantum Espresso

Details• 144 atoms• PBE• 30 Ry• 40 steps

Time needed# cores Elapsed Time CPU time16 MPI 77 min 21 h50 MPI 42 min 45 h64 MPI 46 min 49 h128 MPI 52 min 111 h100 MPI+OpenMP 29 min 49 h

Diagnosis• Unbalanced Workload• Some cores do not compute directly

Solution• Enhance balance by using OpenMP threads

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 35 / 38

Take home messages - User support

If there is something strange in• How your code runs• Your code installation• Administration of account

Who you gonna call?• IDRIS support

− By email: [email protected]− By phone: 01 69 35 85 55

Larger projects• Advanced support

− Large scale code porting− Parallelization of code− New algorithm implementation− Debugging and optimization− Code coupling

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 36 / 38

Conclusions -

Architectures• More cores (Top500 #1: ' 10.6 million cores)• Less power/CPU (Green500)• Efficient networks• Software has to adapt

Tune your computation• Test parallelism parameters of your software• Use memory settings wisely• Adapt the size of your system

Tips• Know your software• Know your environment• RTFM (Read the famous manual)

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 37 / 38

Acknowledgment -

Thanks to• Denis Girou• Fabien Leydier• Eric Gloaguen• Pascal Voury• Pierre-Francois Lavallee• All IDRIS support team

And you for your attention!Any questions? [email protected]

Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 38 / 38