open problems of petascale molecular-dynamics simulations d. grancharov, e. lilkova, n. ilieva, p....
Post on 18-Dec-2015
215 Views
Preview:
TRANSCRIPT
Open Problems of PetascaleMolecular-Dynamics
SimulationsD. Grancharov, E. Lilkova, N. Ilieva, P. Petkov and L.
Litov
Supercomputing Applications in Science and IndustrySept. 20-21, Sunny Beach, Bulgaria
University of Sofia “St. Kl. Ohridski”,Faculty of Physics
Institute for Nuclear Research and Nuclear Energy – BAS
2
Content
1. Introduction2. Molecular dynamics in brief3. ODE integrators4. Scalability of the MD-packages GROMACS and NAMD
in simulations of large systems5. Workload distribution on the computing cores in the
MD simulations6. pp:pme ratio optimization7. Outlook
Nevena IlievaPRACE Regional Conference
“Supercomputing Applications in Science and Industry”High-performance MD-simulations
3
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Method for investigation of time evolution of atomic and molecular systems
Classical description of the systems; Empirical parametrisation of the interaction potential between atoms
and molecules – molecular force field; The force field is conservative, depending on atoms positions only,
pair-aditive (NB: cut-offs, boundary conditions).
... i
ei
vi
ti
ai
s VVVVVV
Bond strength
Bond angle
Torsion Van der Waals interac-tions
Coulomb interaction
4
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Probability to find the system at (r, t)),(Ψ),(Ψ=),( * txtxtxP
CM: Newton equation
QM: Schrödinger equation
xVF
22
dt
xdmamF
MD
dttvmdF
dtxdtv
/
/
5Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Hamiltonian nature of the investigated dynamics
Nevena IlievaPRACE Regional Conference
“Supercomputing Applications in Science and Industry”High-performance MD-simulations
qpqHp /, ppqHq /,
Case of quadratic kinetic energy pqMpTqqMpqqMqqqT TT 1
2
1
2
1,
qVpqTpqH ,,
If, in addition, qVpTpqH , pqH , separable
Flow h symplectic transformation (Theorem, Poincaré, 1899) :,, htphtqtptq
0
0
0
0
I
I
I
Ih
Th
nnHhnn pqpq ,, ,11
6Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Nevena IlievaPRACE Regional Conference
“Supercomputing Applications in Science and Industry”High-performance MD-simulations
Symplectic, i.e.:
preserving the symplectic form
ii dpdq
preserving oriented areas in phase space
Most of the usual numerical methods (primitive Euler, classical Runge-Kutta)are not symplectic integrators Encke/Störmer/leap frog/Verlet:
112/112/1 ,,,, nnnnnnnn pqpqpqpq
Extensions: Fixed step size variable step sizeSingle-step multiple-step, multirate
112/1 2 nnnnn qFFFh
pp
12/112/11
1 2
nnnnnn Fh
ppphMqq
Ex.:
- resonances- most rapid vibrational mode- implicit IA: nonlinear; still resonance
7Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Nevena IlievaPRACE Regional Conference
“Supercomputing Applications in Science and Industry”High-performance MD-simulations
Composition methods
0
111
1
1
r
sr
s
]1[][:...1
rrhhhh s
Splitting methods yfyfyyfy ]2[]1[
Ex. Symplectic Euler & Störmer-Verlet schemes
)(
0
pTq
p
p
0
)(
q
qVp
)()(
)(
00
0
pTtqtq
ptp
p
0
00
)(
)()(
qtq
qVtptp q
T
U
Uh
Th
Uh 2/2/ U
hTh
8
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Combining exact and numerical flows
Splitting in more than two vector fields
]2[]1[hhh
]1[]2[]1[
][]2[]1[
...
)(...)()(
hhhh
N yfyfyfy
Integrators based on generating functions
Variational integrators
),(...),(),(),(
),(),(
22
11
11
qpGhqpGhqphGqpS
qpqp
rr
nnnn
Discretizing the action integral
9
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Integration algorithms with variable time step improved performance degradation of accuracy (trade-off vs. structure preserving) accurate trajectories high-order methods, small timesteps high-order {} structural properties (E, P) deficiency in long-term performance complicated, unstable, chaotic traj’s: asm uch structure as possible loss of symplecticness simple variable / symmetrized time step different ’s in different phase-space regions (computational cost)
Multirate methods (processes subsystems of the ODE system)
e.g. 2-, 3-, and 4-body interactions
10
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
point in the phase space of the
systemHamilton’s
equations Liouville
operator
in Cartesian coordinates)(2/2 xVmpVTH i
ii
и do not
commute
i
i = { , H}
11
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
tmxFv
x
v
xtUF )/)(()(
v
tvx
v
xtU v )(
One-step propagators;step t; apply on
tFm
tFm
vv nnnn 11 2
1
2
1
ttFm
vxx nnnn
2
11
12
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
1121 /:;,0... iiiiii MrrFrr Fi “softer” than Fi+1
)2
exp(...)](exp[)2
exp(}),{exp( 00
21000
0 KKKDKH
)2
exp()]2
exp(...)(exp[)2
)[exp(2
exp( 00
11
32111
00 1 KKKKDKK M
TVKTD ii ,},,{
0
01
001
001
00
021 24
exp2
exp2
exp2
exp42
exp: KKDKDKKrrr
Ex.:
Overall step 0, but effectively till the i-th itteration, if
0...21 ii KK
13
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
MD-simulation performance for large systems (105 atoms and more): scalability, distribution of the computational load its dependence on the functional assignment to the individual processors
epidermic growth factor ~5 x 10 5 atoms
satellite of the tobacco mosaic virus ~10 6 atoms
E.Coli ribosome in water ~2,2 x 10 5 atoms
14
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
System size (number of
atoms)
Number of cores
NAMD CVS 2011-02-19 GROMACS 4.5.4
Performance[ns/day]
Speed upPerformance [ns/day]
Speed up
465 399
8 192
12.21 6.68 2.57 2.74
4 096
10.24 5.60 5.32 5.68
2 048
6.08 3.33 3.08 3.29
1 024
3.14 1.72 1.84 1.96
512
1.83 1.00 0.94 1.00
1 007 930
8 192
11.37 11.08
4 096
7.03 6.85
2 048
3.57 3.47
1 024
1.97 1.92
512
1.03 1.00
2 233 537
8 192
5.86 12.53
4 096
3.44 7.36
2 048
1.80 3.84
1 024
0.92 1.97
512
0.47 1.00
Compiled with the XL compilers of IBM for the architecture of the computing nodes of BlueGene/P; NAMD – compresed input data; Peculiarity in the way the data is loaded into the RAM memory of the computing cores of IBM BlueGene/P: GROMACS up to 700000 atoms
15
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Performance: the simulation time to be obtained for 24 hours with integration step of 2fs
Speed-up: the performance at 512 computing cores as reference value
16
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Distribution of the system parts among the computing cores: particle decomposition (~ N x N/2) domain decomposition long-range interactions: PME algorithm
SCALASCA profiling tool: guides the optimization of parallel programs by measuring and analyzing their behavior during the run instrumentation of the code starting the instrumented code data analysis (Cube 3) estimation of the efficiency, speed and parallelization behavior of the algorithms in use
17
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
в
Distribution of the communications: (а) interleave; (b) pp_pme; (c) Cartesian (red – higher intensity, yellow – lower intensity).
103079 atoms10000 steps x 2 fs = 20 pspbc; Berendsen thermostatno LINCS or P-LINCS used
18
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performancepp:pme optimizationOutlook
ddorder regimeNumberof computing cores
interleave(the default)
[ns/day]
pp_pme[ns/day]
Cartesian[ns/day]
512 6.672 6.592 6.6001024 12.122 11.905 11.9732048 20.856 20.627 20.4264096 27.994 31.306 31.544
Up to 2048 cores – similar performance On 4096 cores the default mode is the slowest one
19
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Test system of ~ 465000 atoms 200 steps 1/8 of all cores – pme cores 512 and 1024 cores (51 GB output data on 1024 cores)
total time 3.10 6 s execution time 2,3.10 6 s t/core (average) 4668 s (pme 5888 s; pp 4494 s) ~ 70 % do_md ~ 30 % long-range electrostatics &domain decomposition on the cores
communications 3,18.10 6 do_md ~ 58 % long-range el. ~ 21 % initialization of envir. ~ 20 % pme 10453 & pp 4174
512 cores
20
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
total time 3.10 6 s execution time 2,3.10 6 s t/core (average) 4912 s(pme 6829 s; pp 4637 s) ~ 66,6 % do_md ~ 30 % long-range electrostatic &domain decomposition on the cores
communications 7,1.10 6 do_md ~ 60 % long-range el. ~ 22 % initialization of envir. ~ 18 % pme 13600 & pp 6100 1024 cores
21
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
Test system of ~ 200000 atoms 2000 steps pme:pp 1:1 to 1:3 (16 8 out of 32 cores) g_tune_pme cut-off radius 0.9 nm 1.15 nm
strong case-dependence of the most appropriate parameter set
22
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
The increasing size and complexity of the investigated objects press strongly on the reconsideration of the existing algorithms not only because of the exploding computation volumes but also because of the poor scalability with the number of the processors employed;
The performed investigations allow to clearly identify the main reasons for the increase of communication between the computing cores and thus for damping down the scalability of the code;
The multiple-time step symplectic integration algorithm we work on aims at resolving this problem.
23
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
24
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
T.F. Miller III, M. Eleftheriou, P. Pattnaik, A. Ndirango, D. Newns, and G. J. Martyna, J. Chem. Phys. 116 (2002) 8649. S. Nosé, J. Phys. Soc. Jpn. 70 (2001) 75. R. Skeel, J.J. Biesiadecki, Ann. Num. Math. 1 (1994) 1–9. D. Janezic and M. Praprotnik, J. Chem. Inf. Comput. Sci. 43 (2003) 1922–1927. M. Tao, H. Owhadi, J.E. Marsden, Symplectic, linearly-implicit and stable integrators with applications to fast symplectic simulatons of constrained dynamics, e-Print arXiv: 1103.4645 (2011). Wei He and Sanjay Govindjee, Application of a SEM Preserving Integrator to Molecular Dynamics, Rep. No. UCB/SEMM-2009/01, Jan 2009, Univ. of California, Berkley, 27 pp. E. Hairer, C. Lubich, and G. Wanner, Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations. (Springer, Heidelberg, Germany, second ed., 2004).
25
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
Molecular DynamicsODE IntegratorsScalability of GROMACS and NAMD (above 2048 cores) Workload distribution and dd_order performance pp:pme optimizationOutlook
R.S. Herbst, Int. J. Radiat. Oncol. Biol. Phys. 59 (2 Suppl) (2004) 21–6: H. Zhang, A. Berezov, Q. Wang, G. Zhang, J. Drebin, R. Murali, M.I. Greene, J. Clin. Invest. 117/8 (2007) 2051-2058. F. Walker, L. Abramowitz, D. Benabderrahmane, X. Duval, V.R. Descatoire, D. Hénin, T.R.S. Lehy, T. Aparicio, Human Pathology 40/11 (2009) 1517–1527. http://www.ks.uiuc.edu/Research/STMV/ E. Villa et al., Proc. Natl. Acad. Sci. USA 106 (2009) 1063–1068. K. Y. Sanbonmatsu and C.-S. Tung1. Journal of Physics: Conference Series 46 (2006) 334–342.
26
Spare slides
PRACE Regional Conference “Supercomputing Applications in Science and Industry”
Nevena IlievaHigh-performance MD-simulations
L. Litov Computer aided drug design
Error at every step
Accumulated error
Algorithms for solving the equation of motionAlgorithms for solving the equation of motion
top related