chemistry and hpc at idris - réseau français de chimie ... · institut du developpement et des...
TRANSCRIPT
www.idris.fr Institut du developpement et des ressources en informatique scientifique
Chemistry and HPC at IDRIS
Thibaut Very - Cours RFCT - 25/01/2017 Thibaut Very - Cours RFCT - 25/01/2017
Outline -
Goals of this presentation• Introduction to
− European High Performance Computing ecosystem− French HPC ecosystem− GENCI
X How to compute at HPC centers− Focus on IDRIS
X OrganizationX MachinesX User support
• Technical points− A few words on hardware− Why is parallelism important
• Application to chemistry− A few examples of problems you can meet− How to solve them
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 2 / 38
-
How is HPC organized in Europe/France?
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 3 / 38
PRACE - Description
Partnership for Advanced Computing in Europe• Created in 2008• 25 member countries• Different tiers• 7 tier-0 machines
− Marconi, Lenovo, 13 PFlop/s, CINECA− Piz Daint, Cray XC30, 7,8 Pflop/s, CSCS− Hazel Hen, Cray XC40, 7,4 Pflop/s, HLRS− JuQUEEN, IBM Blue Gene/Q, 5,9 PFlop/s, FZJ− SuperMUC, IBM iDataPlex, 3,2 PFlop/s, LRZ− Curie, Atos/Bull, 2 PFlop/s, CEA− MareNostrum III, IBM iDataPlex, 1,1 Pflop/s, BSC
• Advanced Training courses(http://www.training.prace-ri.eu)
• In France: GENCI Tier-2
Tier-1
Tier-0European centers
National centers
Regional/universities centers
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 4 / 38
HPC Organisation - GENCI
Grand Equipement National de CalculIntensif
• Created in 2007• Several share holders
− Ministry of education and research (49%)− CNRS (20%)− CEA (20%)− Universities (10%)− INRIA (1%)
• 3 missions− National equipment− Promote HPC on European scale− Promote numerical simulation for academia
and industryX Tier2 centers (Equip@Meso: Mesocenters)X Initiative HPC-PME
• 3 HPC centers− CINES− IDRIS− TGCC
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 5 / 38
GENCI - Thematic committees
Research divided in 11commitees
• CT1: Environmental sciences• CT2a: Non-reactive fluid flows• CT2b: Reactive or multiphase fluid
flows• CT3: Biology and biomedical sciences• CT4: Astrophysics and geophysics• CT5: Theoretical and plasma physics• CT6: Computer science, algorithm and
mathematics• CT7: Molecular dynamics in biology• CT8: Quantum chemistry and
molecular modeling• CT9: Physics, chemistry and material
properties• CT10: Transversal applications
Ada23.9
13.4
6.25.0 10.8
5.30.7
4.0
14.0
11.85.0
Turing3
20.1
35.3 1
8.7
25.6
0.3
2.70.92.8
Environmental sciences
Non-reactive fluid flows
Reactive or multiphase fluid flows
Biology and biomedical science
Astrophysics and geophysics
Theoretical and plasma physics
Computer science, algorithm and mathematics
Molecular dynamics with biological applications
Quantum chemistry and molecular modeling
Physics, chemistry and material properties
Transverse applications
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 6 / 38
GENCI - Thematic committees
Research divided in 11commitees
• CT1: Environmental sciences• CT2a: Non-reactive fluid flows• CT2b: Reactive or multiphase fluid
flows• CT3: Biology and biomedical sciences• CT4: Astrophysics and geophysics• CT5: Theoretical and plasma physics• CT6: Computer science, algorithm and
mathematics• CT7: Molecular dynamics in biology• CT8: Quantum chemistry and
molecular modeling• CT9: Physics, chemistry and material
properties• CT10: Transversal applications
Ada23.9
13.4
6.25.0 10.8
5.30.7
4.0
14.0
11.85.0
Turing3
20.1
35.3 1
8.7
25.6
0.3
2.70.92.8
Environmental sciences
Non-reactive fluid flows
Reactive or multiphase fluid flows
Biology and biomedical science
Astrophysics and geophysics
Theoretical and plasma physics
Computer science, algorithm and mathematics
Molecular dynamics with biological applications
Quantum chemistry and molecular modeling
Physics, chemistry and material properties
Transverse applications
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 6 / 38
GENCI - DARI
How to compute at HPC centers• www.edari.fr• ' 1.5 billion hours
(Real cost: ' 0.03C/hour)• 2 sessions a year (1-year projects)• Preparatory access (a few tens thousand
hours, depending on the machine)
2 projects callsper year
Scientific evaluationby the 11 committees
Evaluation committeeAttribution proposal
Allocation
Attribution committeeArbitration
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 7 / 38
GENCI - Project
Admininistrative part• Project title• Thematic Commitee• Laboratory info.• Project manager info.• Technical manager• Project support (ANR,
Tier2 center, industry,. . . )
Technical part• Software description
and performance• Type of machine
architecture needed• How much data space?• Experience of the team
Scientific part• Project description
− Abstract− State of the art− Research plan− Methods
• Justification of theneeds
• Bibliography
2 projects callsper year
Scientific evaluationby the 11 committees
Evaluation committeeAttribution proposal
Allocation
Attribution committeeArbitration
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 8 / 38
GENCI - Project
Admininistrative part• Project title• Thematic Commitee• Laboratory info.• Project manager info.• Technical manager• Project support (ANR,
Tier2 center, industry,. . . )
Technical part• Software description
and performance• Type of machine
architecture needed• How much data space?• Experience of the team
Scientific part• Project description
− Abstract− State of the art− Research plan− Methods
• Justification of theneeds
• Bibliography
2 projects callsper year
Scientific evaluationby the 11 committees
Evaluation committeeAttribution proposal
Allocation
Attribution committeeArbitration
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 8 / 38
GENCI - Project
Admininistrative part• Project title• Thematic Commitee• Laboratory info.• Project manager info.• Technical manager• Project support (ANR,
Tier2 center, industry,. . . )
Technical part• Software description
and performance• Type of machine
architecture needed• How much data space?• Experience of the team
Scientific part• Project description
− Abstract− State of the art− Research plan− Methods
• Justification of theneeds
• Bibliography
2 projects callsper year
Scientific evaluationby the 11 committees
Evaluation committeeAttribution proposal
Allocation
Attribution committeeArbitration
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 8 / 38
IDRIS - Presentation
Institut du Developpement et desRessources en InformatiqueScientifique
• Institute for the development and resourcesin scientific computing
• CNRS UPS851 founded end of 1993• Located in Orsay ('20 km South-east of
Paris on Paris-Saclay campus)• Building of 1969 (previously CIRCE)• Simultaneously:
− Computing center equiped withsupercomputers among the most powerful
− Excellence center for HPC
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 9 / 38
IDRIS - Teams
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 10 / 38
IDRIS - Machines configuration
10 GB/s
5 PB
2 PB
ADA: IBM x3750m4- 10,624 cores (2.7 GHz)- 332 nodes- RAM: 49 TB- Rpeak: 233 TFlop/s
AdaPP: IBM x3850x5- Pre/post processing- 128 cores (2.67 GHz)- 4 nodes- RAM: 4 TB
Turing: BlueGene/Q- 98,304 cores (1.6 GHz)- 6144 nodes- RAM: 96 TB- Rpeak: 1,258 TFlop/s
Ergon: IBM - Data preservation- 13 nodes- RAM: 4 TB
Robot Oracle: STK SL8500 - Data preservation- 6300 magnetic bands- 10 PB
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 11 / 38
IDRIS - User Support
User Support• Important resources since the beginning of
IDRIS• ' 15 engineers• Hotline
− By phone and e-mail− Monday-Friday, from 9 a.m to 6 p.m− Follow-up of problems
• Advanced support for projects− Porting, optimization (CPU, communications,
memory, I/O)− From a few days to a few months
• Documentation on www.idris.fr• Training courses
− Languages (C,Fortran)− Parallelism (OpenMP, MPI, hybrid)− Free for academics− Industry welcome, but paying (CNRS
Formation Entreprises)
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 12 / 38
IDRIS - Trainings
2017 TrainingsFortran Basics 24.01.2017 3 days ConfirmedMPI 30.01.2017 4 days ConfirmedOpenMP 07.03.2017 3 days AnnouncedFortran Advanced 14.03.2017 4 days AnnouncedFortran Basics 03.05.2017 3 days AnnouncedHybrid MPI/OpenMP 10.05.2017 2 days AnnouncedC 29.05.2017 5 days AnnouncedMPI / OpenMP 12.06.2017 5 days AnnouncedFortran Avanced 20.06.2017 4 days AnnouncedMPI 02.10.2017 4 days AnnouncedFortran Basics 10.10.2017 3 days AnnouncedOpenMP 17.10.2017 3 days AnnouncedHPC debugging 09.11.2017 2 days AnnouncedFortran Avanced 14.11.2017 4 days AnnouncedBlue Gene/Q porting and optimization 23.11.2017 2 days AnnouncedHybrid MPI/OpenMP 05.12.2017 2 days AnnouncedFortran 2003 12.12.2017 3 days Announced
To register for a traininghttps://cours.idris.fr/
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 13 / 38
-
Some technical aspects that can help tounderstand how to enhance performance.
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 14 / 38
Software - Installation
Different environments• Standard workstation
− Usually given by the developpers• Supercomputer
− Non-standard path for libraries− Shared folders− To be tested for performance
Test case: Quantum Espresso• Zeolite• PBE• Γ point• 10 steps
Compiling
Compiler options -O0 -O2 -O2 -xAVX -mkl -O3 -xAVX -parallel -mkl
Elapsed time 3h 3min 20min 28s 6min 6s 9min
At IDRIS• User can compile/use their own versions
− You can ask for help• System wide installations are done by support
− But only official versions
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 15 / 38
Software - Modules
List of modules available at IDRIS
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 16 / 38
Job Scheduler - Options
Schedulers• LoadLeveler (At IDRIS)• SLURM• PBS• Univa Grid Engine• . . .
Usage• Allows sheduling numerous jobs• Allocate resources required
− Number of nodes, cores− Memory− Time
• Manages priorities, dependencies, etc. . .• Specific syntax (usually shell comments)• Commands to be run
Example: Ada queue 01/19• 330 jobs in queue• 91 waiting• 153 running• 86 held• 208 users
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 17 / 38
Job Scheduler - Options
Schedulers• LoadLeveler (At IDRIS)• SLURM• PBS• Univa Grid Engine• . . .
Usage• Allows sheduling numerous jobs• Allocate resources required
− Number of nodes, cores− Memory− Time
• Manages priorities, dependencies, etc. . .• Specific syntax (usually shell comments)• Commands to be run
Example: Ada queue 01/19• 330 jobs in queue• 91 waiting• 153 running• 86 held• 208 users
Classes
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 17 / 38
Job Scheduler - Options
Schedulers• LoadLeveler (At IDRIS)• SLURM• PBS• Univa Grid Engine• . . .
Usage• Allows sheduling numerous jobs• Allocate resources required
− Number of nodes, cores− Memory− Time
• Manages priorities, dependencies, etc. . .• Specific syntax (usually shell comments)• Commands to be run
Example: Ada queue 01/19• 330 jobs in queue• 91 waiting• 153 running• 86 held• 208 users
Classes
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 17 / 38
Job Scheduler - Options
Schedulers• LoadLeveler (At IDRIS)• SLURM• PBS• Univa Grid Engine• . . .
Usage• Allows sheduling numerous jobs• Allocate resources required
− Number of nodes, cores− Memory− Time
• Manages priorities, dependencies, etc. . .• Specific syntax (usually shell comments)• Commands to be run
Example: Ada queue 01/19• 330 jobs in queue• 91 waiting• 153 running• 86 held• 208 users
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 17 / 38
Memory - Management
Different kinds of memory• Cache
− Close to core− Access ' 1ns (L1)− Small (few kB → few MB)
• RAM− Access still fast (' 100 ns)− Usually a few GB/core
• SSD Disks− Access ' 10µs− Some hundreds of GB
• HD Disks− Access ' 1ms− Some TB
Why is it important?• Strongly affects code perfomance
− Try to keep everything in RAM− OS uses RAM for disk cache!− You should tune memory settings!
• Users should set up their job having that in mind• Some limitations on HPC centers (Size and number of
files)
Examples• VASP, 289 atoms, 1024 processes, 2 geometry steps
− With I/O (' 1GB): 662s− Without I/O: 338s
• Crystal, 289 atoms, 1024 processes, 1 SCF− PCrystal: 12,300 files ; crashed after 420s− PCrystal: 1040 files ; 1239s− PCrystal optimized: 1040 files ; 1117s
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 18 / 38
-
An important technical aspect: Parallelism
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 19 / 38
Parallelization - Useful vocabulary
• Speedup: S(n) = TseqTpara(n)
• Strong scaling: Scaling with fixed problemsize
• Weak scaling: Scaling with problem sizelinear to number of tasks
Strong
Weak
#cores
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 20 / 38
Parallelization - Laws
Amdahl’s law• Stated in 1967• Given a fixed problem size (Strong scaling)• What is the evolution of speedup w.r.t.
cores?• A part α of the code is serial• S(n) = 1
α+ 1−αn
• Perfect scaling: S(n) = n
limn→∞
S(n) =1
α
0
10
20
30
40
50
60
70
80
90
100
1 4 16 64 256 1024 4096 16384
Spe
ed-U
p
Number of cores
a=0.01a=0.1a=0.5
α 1-α1-α
α
1 core
n cores (1-α)/n
Quite pessimistic. Real applications usually not fixed size
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 21 / 38
Parallelization - Laws
Amdahl’s law• Stated in 1967• Given a fixed problem size (Strong scaling)• What is the evolution of speedup w.r.t.
cores?• A part α of the code is serial• S(n) = 1
α+ 1−αn
• Perfect scaling: S(n) = n
limn→∞
S(n) =1
α
1
4
16
64
256
1024
4096
16384
65536
1 4 16 64 256 1024 4096 16384
Spe
ed-U
p
Number of cores
a=0.01a=0.1a=0.5Ideal
α 1-α1-α
α
1 core
n cores (1-α)/n
Quite pessimistic. Real applications usually not fixed size
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 21 / 38
Parallelization - Laws
Amdahl’s law• Stated in 1967• Given a fixed problem size (Strong scaling)• What is the evolution of speedup w.r.t.
cores?• A part α of the code is serial• S(n) = 1
α+ 1−αn
• Perfect scaling: S(n) = n
limn→∞
S(n) =1
α
1
4
16
64
256
1024
4096
16384
65536
1 4 16 64 256 1024 4096 16384
Spe
ed-U
p
Number of cores
a=0.01a=0.1a=0.5Ideal
α 1-α1-α
α
1 core
n cores (1-α)/n
Quite pessimistic. Real applications usually not fixed size
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 21 / 38
Parallelization - Laws
Gustafson-Barsis Law• Stated in 1988• Problem size increases with number of
cores (Weak scaling)• A part α of the code is serial• S(n) = n− α(n− 1)
• Difficult to achieve in real applications• Optimistic
20
40
60
80
100
120
20 40 60 80 100 120
Spe
ed-U
p
Number of cores
a=0.1a=0.2a=0.5a=0.9
α n*(1-α)
α
1 core
n cores 1-α
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 22 / 38
Parallelization - Laws
Gustafson-Barsis Law• Stated in 1988• Problem size increases with number of
cores (Weak scaling)• A part α of the code is serial• S(n) = n− α(n− 1)
• Difficult to achieve in real applications• Optimistic
20
40
60
80
100
120
20 40 60 80 100 120
Spe
ed-U
p
Number of cores
a=0.1a=0.2a=0.5a=0.9
α n*(1-α)
α
1 core
n cores 1-α
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 22 / 38
Parallelization - Shared Memory Model
Shared memory• 1 Process creates several threads• Share work between threads• All threads have access to the
same memory space• OpenMP: Example
core-Hamiltonian
core core
core core
Memory
1 process
6 5 4 3 2 1 0
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 23 / 38
Parallelization - Shared Memory Model
Shared memory• 1 Process creates several threads• Share work between threads• All threads have access to the
same memory space• OpenMP: Example
core-Hamiltonian
double charge(int);
double distance(int ,int);
double KE(int);
void core(int Nelectrons ,int Natoms ,
double* Hc)
{
for (int i=0;i<Nelectrons ;++i)
{
Hc[i] = -KE(i);
for (int a=0;a<Natoms ;++a)
{
Hc[i] -= charge(a)/ distance(i,a);
}
}
}
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 23 / 38
Parallelization - Shared Memory Model
Shared memory• 1 Process creates several threads• Share work between threads• All threads have access to the
same memory space• OpenMP: Example
core-Hamiltonian
double charge(int);
double distance(int ,int);
double KE(int);
void core(int Nelectrons ,int Natoms ,
double* Hc)
{
#pragma omp parallel for
for (int i=0;i<Nelectrons ;++i)
{
Hc[i] = -KE(i);
for (int a=0;a<Natoms ;++a)
{
Hc[i] -= charge(a)/ distance(i,a);
}
}
}
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 23 / 38
Parallelization - Shared Memory Model
Shared memory• 1 Process creates several threads• Share work between threads• All threads have access to the
same memory space• OpenMP: Example
core-Hamiltonian
double charge(int);
double distance(int ,int);
double KE(int);
void core(int Nelectrons ,int Natoms ,
double* Hc)
{
#pragma omp parallel for
for (int i=0;i<Nelectrons ;++i)
{
Hc[i] = -KE(i);
for (int a=0;a<Natoms ;++a)
{
Hc[i] -= charge(a)/ distance(i,a);
}
}
}Pros• Might be easy to parallelize with• Even automatically by compilers
Cons• Intranode• Difficult to debug
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 23 / 38
Parallelization - Distributed Memory model
Distributed memory• Several independant processes
created• Share work between processes• Each process have directly
access to its memory• Interprocess communication
through network• MPI: Message passing interface
core
Memory
core
Memory
core
Memory
core
Memory
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 24 / 38
Parallelization - Distributed Memory model
Distributed memory• Several independant processes
created• Share work between processes• Each process have directly
access to its memory• Interprocess communication
through network• MPI: Message passing interface
if (taskid > MASTER)
{
mtype = FROM_MASTER;
MPI_Recv (&offset , 1, MPI_INT , MASTER , mtype , MPI_COMM_WORLD ,
&status );
MPI_Recv (&rows , 1, MPI_INT , MASTER , mtype , MPI_COMM_WORLD ,
&status );
MPI_Recv (&a, rows*NCA , MPI_DOUBLE , MASTER , mtype , MPI_COMM_WORLD ,
&status );
MPI_Recv (&b, NCA*NCB , MPI_DOUBLE , MASTER , mtype , MPI_COMM_WORLD ,
&status );
for (k=0; k<NCB; k++)
for (i=0; i<rows; i++)
{
c[i][k] = 0.0;
for (j=0; j<NCA; j++)
c[i][k] = c[i][k] + a[i][j] * b[j][k];
}
mtype = FROM_WORKER;
MPI_Send (&offset , 1, MPI_INT , MASTER , mtype , MPI_COMM_WORLD );
MPI_Send (&rows , 1, MPI_INT , MASTER , mtype , MPI_COMM_WORLD );
MPI_Send (&c, rows*NCB , MPI_DOUBLE , MASTER , mtype , MPI_COMM_WORLD );
}
Pros• Intra or Inter node
Cons• Needs explicit programming• Needs efficient network
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 24 / 38
Parallelization - Hydrid OpenMP/MPI
6 5 4 3 2 1 0
6 5 4 3 2 1 0
6 5 4 3 2 1 0
6 5 4 3 2 1 0
core core
core core
Memory
core core
core core
Memory
core core
core core
Memory
core core
core core
Memory
Hybrid programming• Adapted to modern architecture• Might increase code performance
− 2 to 5 fold memory footprint reduction− Larger system for the same cost
• Coding tricky
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 25 / 38
-
Application to chemistry related softwareproblems
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 26 / 38
Parallelism - Chemistry
What can you parallelize?
System splitting Wave function, e− density Mathematics functions
• Multiple images • Basis set coefficients • Functions (FFT,. . . )• NEB • Bands • Eigen solvers• Numerical frequencies • Grid • Matrices handling
Parallelization Complexity
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 27 / 38
Parallelism - Grand Challenges 2012
Free-Energy Surface• Forcefield• Aminoacids through
membrane• 15,000 atoms• 61 replicas• Gromacs• 2µs/day
Bonhenry, Dehez, Tarek (Lorraine university)
Ionic liquid• DFT• Structure and properties• ' 900 atoms• 20 ps• CP2K/CPMD
Sieffert (Grenoble university)
Li-ion anode• BigDFT• Up to 625 atoms• Surface boundary
conditions
S. Krishnan, D. Caliste, T. Deutsch, L. Genovese, P.Pochet (Grenoble University, CEA)
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 28 / 38
Parallelization - Summary
What you can have• Reduce time to solution for a fixed size
problem• More memory
− Increase size of systems− New kinds of computations
Limitations• Extensibility not infinite• Parallelism overhead• Balance of workload between threads,
processes• Overall time consumption is higher• Code adaptation
Parallelism is critical• Do not expect improvement thanks to hardware• Fits the architectures of machines• Larger simulations
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 29 / 38
-
Chemistry at IDRIS
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 30 / 38
Software - Installed at IDRIS
On wikipedia: 125 different pieces of software
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 31 / 38
Chemistry - Installed At IDRIS
Name license Access Shared Distributed Accelerators
ADF Site (6400C/year) Free Yes Yes NoGaussian Site (28750C/(20 years)) Ask Yes Yes YesVASP Support (1000C/group) Req. No Yes YesMolpro Site (10,000 £) Free No Yes NoMolcas Site (50,000 SEK) Free No Yes NoABinit GPL Free Yes Yes YesCPMD Site Free Yes Yes YesCrystal Site (6800C/an) Ask No Yes NoSIESTA Site Ask No Yes NoQuantum espresso GPL Free Yes Yes YesNWChem ECL Free Yes Yes NoCP2K GPL Free Yes Yes YesBigDFT GPL Free Yes Yes YesGromacs GPL Free Yes Yes YesNAMD GPL Free Yes Yes YesLAMMPS GPL Free No Yes Yes
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 32 / 38
Examples - Turbomole
Details• Turbomole (local workstation+curie)
− 3 Parallel modesX MPIX SMPX SMP(with MPI)
• 190 atoms• Geometry optimization
From workstation to computing center• Large decrease in performance for
SMP-MPI− Only 2 cores working on Curie
• Vanished after tuning mpi kickstarter options(core binding -vapi -affmanual=“0x0,. . . ”)
3 modes to rule them all: Parallelefficiency
• Most efficient: SMP-MPI• MPI: 1.2× slower• SMP: 2.5× slower
Test your system/code on the machines
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 33 / 38
Examples - Turbomole
Details• Turbomole (local workstation+curie)
− 3 Parallel modesX MPIX SMPX SMP(with MPI)
• 190 atoms• Geometry optimization
From workstation to computing center• Large decrease in performance for
SMP-MPI− Only 2 cores working on Curie
• Vanished after tuning mpi kickstarter options(core binding -vapi -affmanual=“0x0,. . . ”)
3 modes to rule them all: Parallelefficiency
• Most efficient: SMP-MPI• MPI: 1.2× slower• SMP: 2.5× slower
Test your system/code on the machines
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 33 / 38
Examples - Crystal
Details• MPPCrystal• 298 atomes• Γ point
Core workload increases. Stated in output file
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 34 / 38
Examples - Quantum Espresso
Details• 144 atoms• PBE• 30 Ry• 40 steps
Time needed# cores Elapsed Time CPU time16 MPI 77 min 21 h50 MPI 42 min 45 h64 MPI 46 min 49 h128 MPI 52 min 111 h100 MPI+OpenMP 29 min 49 h
Diagnosis• Unbalanced Workload• Some cores do not compute directly
Solution• Enhance balance by using OpenMP threads
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 35 / 38
Take home messages - User support
If there is something strange in• How your code runs• Your code installation• Administration of account
Who you gonna call?• IDRIS support
− By email: [email protected]− By phone: 01 69 35 85 55
Larger projects• Advanced support
− Large scale code porting− Parallelization of code− New algorithm implementation− Debugging and optimization− Code coupling
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 36 / 38
Conclusions -
Architectures• More cores (Top500 #1: ' 10.6 million cores)• Less power/CPU (Green500)• Efficient networks• Software has to adapt
Tune your computation• Test parallelism parameters of your software• Use memory settings wisely• Adapt the size of your system
Tips• Know your software• Know your environment• RTFM (Read the famous manual)
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 37 / 38
Acknowledgment -
Thanks to• Denis Girou• Fabien Leydier• Eric Gloaguen• Pascal Voury• Pierre-Francois Lavallee• All IDRIS support team
And you for your attention!Any questions? [email protected]
Institut du developpementet des ressourcesen informatique scientifique Thibaut Very - Cours RFCT - 25/01/2017 38 / 38