opportunities for biological consortia on hpcx code capabilities and performance hpcx and ccp staff
TRANSCRIPT
Opportunities for Biological Consortia on HPCx
Code Capabilities and Performance
HPCx and CCP Staff
http://www.ccp.ac.uk/
http://www.hpcx.ac.uk/
Royal Institution, 6th November 20032HPCx/Biology Discussions
Welcome to the Meeting
• Background– HPCx
• Objectives– to consider whether there is a case to bid
• Agenda– Introduction to the HPCx service– Overview of Code Performance– Contributed Presentations– Invited Presentation - – Discussion
Royal Institution, 6th November 20033HPCx/Biology Discussions
Outline
• Overview of Code Capabilities and Performance
– Macromolecular simulation• DL_POLY, AMBER, CHARMM, NAMD
– Localised basis molecular codes• Gaussian, GAMESS-UK, NWChem
– Local basis periodic code• CRYSTAL
– Plane wave periodic codes• CASTEP
• CPMD (Alessandro Curioni talk)
• Note - consortium activity is not limited to these codes.
The DL_POLY Molecular Dynamics Simulation Package
Bill Smith
Royal Institution, 6th November 20035HPCx/Biology Discussions
DL_POLY Background
• General purpose parallel MD code• Developed at Daresbury Laboratory for CCP5 1994-today• Available free of charge (under licence) to University researchers
world-wide• DL_POLY versions:
– DL_POLY_2• Replicated Data, up to 30,000 atoms• Full force field and molecular description
– DL_POLY_3• Domain Decomposition, up to 1,000,000 atoms• Full force field but no rigid body description.
Royal Institution, 6th November 20036HPCx/Biology Discussions
DL_POLY Force Field
• Intermolecular forces– All common van de Waals potentials– Sutton Chen many-body potential– 3-body angle forces (SiO2)– 4-body inversion forces (BO3)– Tersoff potential -> Brenner
• Intramolecular forces– Bonds, angle, dihedrals, inversions
• Coulombic forces– Ewald* & SPME (3D), HK Ewald* (2D), Adiabatic shell model,
Reaction field, Neutral groups*, Truncated Coulombic,
• Externally applied field– Walled cells,electric field,shear field, etc
* Not in DL_POLY_3
Royal Institution, 6th November 20037HPCx/Biology Discussions
Boundary Conditions
• None (e.g. isolated macromolecules)
• Cubic periodic boundaries
• Orthorhombic periodic boundaries
• Parallelepiped periodic boundaries
• Truncated octahedral periodic boundaries*
• Rhombic dodecahedral periodic boundaries*
• Slabs (i.e. x,y periodic, z nonperiodic)
Royal Institution, 6th November 20038HPCx/Biology Discussions
Algorithms and Ensembles
Algorithms
• Verlet leapfrog
• RD-SHAKE
• Euler-Quaternion*
• QSHAKE*
• [All combinations]
* Not in DL_POLY_3
Ensembles
• NVE
• Berendsen NVT
• Hoover NVT
• Evans NVT
• Berendsen NPT
• Hoover NPT
• Berendsen NT
• Hoover NT
Royal Institution, 6th November 20039HPCx/Biology Discussions
AA BB
CC DD
Migration from Replicated to Distributed data DL_POLY-3 : Domain Decomposition
• Distribute atoms, forces across the nodes– More memory efficient, can address
much larger cases (105-107)
• Shake and short-ranges forces require only neighbour communication– communications scale linearly with
number of nodes
• Coulombic energy remains global– strategy depends on problem and
machine characteristics
– Adopt Smooth Particle Mesh Ewald scheme
• includes Fourier transform smoothed charge density (reciprocal space grid typically 64x64x64 - 128x128x128)
An alternative FFT algorithm has been designed to reduce communication costs
Royal Institution, 6th November 200310HPCx/Biology Discussions
• Conventional routines (e.g. fftw) assume plane or column distributions
• A global transpose of the data is required to complete the 3D FFT and additional costs are incurred re-organising the data from the natural block domain decomposition.
• An alternative FFT algorithm has been designed to reduce communication costs.
– the 3D FFT are performed as a series of 1D FFTs, each involving communications only between blocks in a given column
– More data is transferred, but in far fewer messages
– Rather than all-to-all, the communications are column-wise only
Plane Block
Migration from Replicated to Distributed data DL_POLY-3: Coulomb Energy Evaluation
Royal Institution, 6th November 200311HPCx/Biology Discussions
DL_POLY_2 & 3 Differences
• Rigid bodies not in _3
• MSD not in _3
• Tethered atoms not in _3
• Standard Ewald not in _3
• HK_Ewald not in _3
• DL_POLY_2 I/O files work in _3 but NOT vice versa
• No multiple timestep in _3
Royal Institution, 6th November 200312HPCx/Biology Discussions
DL_POLY_2 Developments
• DL_MULTI - Distributed multipoles
• DL_PIMD - Path integral (ionics)
• DL_HYPE - Rare event simulation
• DL_POLY - Symplectic versions 2/3
• DL_POLY - Multiple timestep
• DL_POLY - F90 re-vamp
Royal Institution, 6th November 200313HPCx/Biology Discussions
DL_POLY_3 on HPCx
• Test case 1 (552960 atoms, 300Dt)– NaKSi2O5 - disilicate glass– SPME (1283grid)+3 body terms, 15625 LC)– 32-512 processors (4-64 nodes)
Royal Institution, 6th November 200314HPCx/Biology Discussions
DL_POLY_3 on HPCx
• Test case 2 (792960 atoms, 10Dt)– 64xGramicidin(354)+256768 H2O– SHAKE+SPME(2563 grid),14812 LC– 16-256 processors (2-32 nodes)
Royal Institution, 6th November 200315HPCx/Biology Discussions
DL_POLY People
• Bill Smith DL_POLY_2 & _3 & GUI– [email protected]
• Ilian Todorov DL_POLY_3– [email protected]
• Maurice Leslie DL_MULTI– [email protected]
• Further Information:– W. Smith and T.R. Forester, J. Molec. Graphics, (1996), 14, 136– http://www.cse.clrc.ac.uk/msi/software/DL_POLY/index.shtml– W. Smith, C.W. Yong, P.M. Rodger,Molecular Simulation (2002), 28,
385
AMBER, NAMD and Gaussian
Lorna Smith and Joachim Hein
Royal Institution, 6th November 200319HPCx/Biology Discussions
AMBER
• AMBER (Assisted Model Building with Energy Refinement)– A molecular dynamics program, particularly for biomolecules– Weiner and Kollman, University of California, 1981.
• Current version – AMBER7• Widely used suite of programs
– Sander, Gibbs, Roar
• Main program for molecular dynamics: Sander– Basic energy minimiser and molecular dynamics– Shared memory version – only for SGI and Cray– MPI version: master / slave, replicated data model
Royal Institution, 6th November 200320HPCx/Biology Discussions
AMBER - Initial Scaling
0
2
4
6
8
10
12
0 16 32 48 64 80 96 112 128 144No of Processors
Sp
ee
d-u
p
• Factor IX protein with Ca++ ions – 90906 atoms
Royal Institution, 6th November 200321HPCx/Biology Discussions
Current developments - AMBER
• Bob Duke– Developed a new version of Sander on HPCx– Originally called AMD (Amber Molecular Dynamics)– Renamed PMEMD (Particle Mesh Ewald Molecular Dynamics)
• Substantial rewrite of the code– Converted to Fortran90, removed multiple copies of routines,…– Likely to be incorporated into AMBER8
• We are looking at optimising the collective communications – the reduction / scatter
Royal Institution, 6th November 200322HPCx/Biology Discussions
Optimisation – PMEMD
0
50
100
150
200
250
300
0 50 100 150 200 250 300
No of Processors
Tim
e (s
eco
nd
s)
PMEMD
Sander7
Royal Institution, 6th November 200323HPCx/Biology Discussions
NAMD
• NAMD– molecular dynamics code designed for high-performance
simulation of large biomolecular systems. – Theoretical and Computational Biophysics Group, University of
Illinois at Urbana-Champaign.
• Versions 2.4, 2.5b and 2.5 available on HPCx
• One of the first codes to be awarded a capability incentive rating – bronze
Royal Institution, 6th November 200324HPCx/Biology Discussions
NAMD Performance
•Benchmarks from Prof Peter Coveney•TCR-peptide-MHC system
Royal Institution, 6th November 200325HPCx/Biology Discussions
NAMD Performance
Royal Institution, 6th November 200326HPCx/Biology Discussions
Molecular Simulation - NAMD Scaling
0
128
256
384
512
0 128 256 384 512
LinearIBM SP/Regatta-HCompaq AlphaServer ES45/1000
• standard NAMD ApoA-I benchmark, a system comprising 92,442 atoms, with 12Å cutoff and PME every 4 time steps.
• scalability improves with larger simulations - speedup of 778 on 1024 CPUs of TCS-1 in a 327K particle simulation of F1-ATPase. Number of CPUs
Speedup
• Parallel, object-oriented MD code• High-performance simulation of
large biomolecular systems• Scales to 100’s of processors on
high-end parallel platforms
http://www.ks.uiuc.edu/Research/namd/
Royal Institution, 6th November 200327HPCx/Biology Discussions
Performance Comparison
• Performance comparison between AMBER, CHARMM and NAMD
• See: http://www.scripps.edu/brooks/Benchmarks/
• Benchmark– dihydrofolate reductase protein in an explicit water bath with cubic
periodic boundary conditions. – 23,558 atoms
Royal Institution, 6th November 200328HPCx/Biology Discussions
Performance
Royal Institution, 6th November 200329HPCx/Biology Discussions
Gaussian
• Gaussian 03– Performs semi-empirical and ab initio molecular orbital
calulcations.– Gaussian Inc, www.gaussian.com
• Shared memory version available on HPCx– Limited to the size of a logical partition (8 processors)– Phase 2 upgrade will allow access to 32 processors
• Task farming option
CRYSTAL and CASTEP
Ian Bush and Martin Plummer
Royal Institution, 6th November 200331HPCx/Biology Discussions
Crystal
• Electronic structure and related properties of periodic systems
• All electron, local Gaussian basis set, DFT and Hartree-Fock
• Under continuous development since 1974
• Distributed to over 500 sites world wide
• Developed jointly by Daresbury and the University of Turin
Royal Institution, 6th November 200332HPCx/Biology Discussions
Properties Energy Structure Vibrations (phonons) Elastic tensor Ferroelectric polarisation Piezoelectric constants X-ray structure factors Density of States / Bands Charge/Spin Densities Magnetic Coupling Electrostatics (V, E, EFG classical) Fermi contact (NMR) EMD (Compton, e-2e)
Crystal Functionality
• Basis Set– LCAO - Gaussians
• All electron or pseudopotential• Hamiltonian
– Hartree-Fock (UHF, RHF)
– DFT (LSDA, GGA)
– Hybrid funcs (B3LYP)
• Techniques– Replicated data parallel
– Distributed data parallel
• Forces – Structural optimization
• Direct SCF• Visualisation
– AVS GUI (DLV)
Royal Institution, 6th November 200333HPCx/Biology Discussions
Benchmark Runs on Crambin
• Very small protein from Crambe Abyssinica - 1284 atoms per unit cell
• Initial studies using STO3G (3948 basis functions)
• Improved to 6-31G * * (12354 functions)
• All calculations Hartree-Fock
• As far as we know the largest HF calculation ever converged
Royal Institution, 6th November 200334HPCx/Biology Discussions
Crambin - Parallel Performance
• Fit measured data to Amdahl’s law to obtain estimate of speed up
• Increasing the basis set size increases the scalability
• About 700 speed up on 1024 processors for 6-31G * *
• Takes about 3 hours instead of about 3 months
• 99.95% parallel
0
128
256
384
512
640
768
896
1024
0 256 512 768 1024
Number of Processors
Spee
d-up
Linear6-31G* (12,354 GTOs)6-31G (7,194 GTOs)STO-3G (3,948 GTOs)
Royal Institution, 6th November 200335HPCx/Biology Discussions
Results – Electrostatic Potential
• Charge density isosurface coloured according to potential
• Useful to determine possible chemically active groups
Royal Institution, 6th November 200336HPCx/Biology Discussions
Futures - Rusticyanin
• Rusticyanin (Thiobacillus Ferrooxidans) has 6284 atoms and is involved in redox processes
• We have just started calculations using over 33000 basis functions
• In collaboration with S.Hasnain (DL) we want to calculate redox potentials for rusticyanin and associated mutants
Royal Institution, 6th November 200337HPCx/Biology Discussions
What is Castep?
• First principles (DFT) materials simulation code– electronic energy – geometry optimization– surface interactions– vibrational spectra
• materials under pressure, chemical reactions
– molecular dynamics
• Method (direct minimization)– plane wave expansion of valence electrons– pseudopotentials for core electrons
Royal Institution, 6th November 200338HPCx/Biology Discussions
HPCx: biological applications
• Examples currently include:– NMR of proteins– hydroxyapatite (major component of bone)– chemical processes following stroke
• Possibility of treating systems with a few hundred atoms on HPCx
• May be used in conjunction with classical codes (eg DL_POLY) for detailed QM treatment of ‘features of interest’
Royal Institution, 6th November 200339HPCx/Biology Discussions
Castep 2003 HPCx performance gain
0
1000
2000
3000
4000
5000
6000
7000
8000
Job
tim
e
80 160 240 320
Total number of processors
Al2O3 120 atom cell, 5 k- points
Jan-03
Current 'Best'
Royal Institution, 6th November 200340HPCx/Biology Discussions
Castep 2003 HPCx performance gain
0
2000
4000
6000
8000
10000
12000
14000
16000
Job
Tim
e
128 256 512
Total number of processors
Al2O3 270 atom cell, 2 k- points
Jan-03
Current 'Best'
Royal Institution, 6th November 200341HPCx/Biology Discussions
HPCx: biological applications
• Castep (version 2) is written by:– M Segall, P Lindan, M Probert C Pickard, P Hasnip, S Clark, K
Refson, V Milman, B Montanari, M Payne.– ‘Easy’ to understand top-level code.
• Castep is fully maintained and supported on HPCx
• Castep is distributed by Accelrys Ltd
• Castep is licensed free to UK academics by the UKCP consortium (contact [email protected])
CHARMM, NWChem and GAMESS-UK
Paul Sherwood
Royal Institution, 6th November 200343HPCx/Biology Discussions
Single, shared data structure
Physically distributed data
NWChem
• Objectives– Highly efficient and portable
MPP computational chemistry package
– Distributed Data - Scalable with respect to chemical system size as well as MPP hardware size
– Extensible Architecture• Object-oriented design
– abstraction, data hiding, handles, APIs
• Parallel programming model– non-uniform memory access,
global arrays
• Infrastructure– GA, Parallel I/O, RTDB, MA, …
– Wide range of parallel functionality essential for HPCx
• Tools– Global arrays:
• portable distributed data tool:
• Used by CCP1 groups (e.g. MOLPRO)
– PeIGS:• parallel eigensolver, • guaranteed orthogonality
of eigenvectors
Royal Institution, 6th November 200344HPCx/Biology Discussions
Distributed Data SCF
Pictorial representation of the iterative SCF Pictorial representation of the iterative SCF process in (i) a sequential process, and (ii) a process in (i) a sequential process, and (ii) a distributed data parallel process: distributed data parallel process: MOAO MOAO represents the molecular orbitals,represents the molecular orbitals, P P the the density matrix and density matrix and FF the Fock or the Fock or Hamiltonian matrixHamiltonian matrix
SequentialSequential
Distributed DataDistributed Data
MOAOP
dgemm
Integrals
VXC
VCoul
V1e
SequentialEigensolver
F
guessorbitals
If Converged
MOAO P
ga_dgemm
IntegralsVXC
VCoulV 1e
PeIGSF
guessorbitals
If Converged
Royal Institution, 6th November 200345HPCx/Biology Discussions
NWChem
NWChem Capabilities (Direct, Semi-direct and conventional):– RHF, UHF, ROHF using up to 10,000 basis functions; analytic
1st and 2nd derivatives.– DFT with a wide variety of local and non-local XC potentials,
using up to 10,000 basis functions; analytic 1st and 2nd derivatives.
– CASSCF; analytic 1st and numerical 2nd derivatives.– Semi-direct and RI-based MP2 calculations for RHF and
UHF wave functions using up to 3,000 basis functions; analytic 1st derivatives and numerical 2nd derivatives.
– Coupled cluster, CCSD and CCSD(T) using up to 3,000 basis functions; numerical 1st and 2nd derivatives of the CC energy.
– Classical molecular dynamics and free energy simulations with the forces obtainable from a variety of sources
Royal Institution, 6th November 200346HPCx/Biology Discussions
SiSi88OO77HH1818 347/832347/832
SiSi88OO2525HH1818 617/1444617/1444
SiSi2626OO3737HH3636 1199/28181199/2818
SiSi2828OO6767HH3030 1687/39281687/3928
• DFT Calculations with DFT Calculations with Coulomb FittingCoulomb Fitting
Basis (Godbout et al.)Basis (Godbout et al.) DZVP - O, SiDZVP - O, Si
DZVP2 - HDZVP2 - HFitting Basis:Fitting Basis:
DGAUSS-A1 - O, SiDGAUSS-A1 - O, SiDGAUSS-A2 - HDGAUSS-A2 - H
• NWChem & GAMESS-UKNWChem & GAMESS-UK
Both codes use auxiliary fitting Both codes use auxiliary fitting basis for coulomb energy, with 3 basis for coulomb energy, with 3 centre 2 electron integrals held in centre 2 electron integrals held in corecore..
Case Studies - Zeolite Fragments
Royal Institution, 6th November 200347HPCx/Biology Discussions
2388
1147
2414
951 907
1271
517 502 490404 390
303
0
500
1000
1500
2000
2500
32 64 128
CS7 AMD K7/1000 + SCICS9 P4/2000 + Myrinet 2kCS2 QSNet Alpha Cluster/667SGI Origin 3800/R14k-500IBM SP/p690AlphaServer SC ES45/1000
Number of CPUs Number of CPUs
Measured Time (seconds)
SiSi2828OO6767HH3030 1687/39281687/3928SiSi2626OO3737HH3636 1199/28181199/2818
4682
2424
2351
5507
2053
1580
3008
1617
1504
3050
1418
1182 880 834611
0
2000
4000
6000
32 64 128
CS7 AMD K7/1000 + SCICS9 P4/2000 + Myrinet 2kCS2 QSNet Alpha ClusterSGI Origin 3800/R14k-500IBM SP/p690AlphaServer SC ES45/1000
DFT Coulomb Fit - NWChem
Measured Time (seconds)
Royal Institution, 6th November 200348HPCx/Biology Discussions
• DZVP Basis (DZV_A2) and DgaussDZVP Basis (DZV_A2) and Dgauss A1_DFT Fitting basis: A1_DFT Fitting basis:
AO basis: AO basis: 3554 3554 CD basis:CD basis: 1271312713• IBM SP/p690)IBM SP/p690)
Wall time (13 SCF iterations):Wall time (13 SCF iterations):64 CPUs = 9,184 seconds64 CPUs = 9,184 seconds128 CPUs= 3,966 seconds128 CPUs= 3,966 seconds
MIPS R14k-500 CPUs (Teras)MIPS R14k-500 CPUs (Teras)Wall time (13 SCF iterations):Wall time (13 SCF iterations):
64 CPUs = 5,242 seconds64 CPUs = 5,242 seconds128 CPUs= 3,451 seconds128 CPUs= 3,451 seconds
Zeolite ZSM-5Zeolite ZSM-5Zeolite ZSM-5Zeolite ZSM-5
• 3-centre 2e-integrals = 1.00 X 103-centre 2e-integrals = 1.00 X 10 12 12
• Schwarz screening = 6.54 X 10Schwarz screening = 6.54 X 10 9 9
• % 3c 2e-ints. In core = 100%% 3c 2e-ints. In core = 100%
Memory-driven Approaches: NWChem - DFT (LDA): Performance on the IBM SP/p690
Royal Institution, 6th November 200349HPCx/Biology Discussions
† M.F. Guest, J.H. Amos, R.J. Buenker, H.J.J. van Dam, M. Dupuis, N.C. Handy, I.H. Hillier, P.J. Knowles, V. Bonacic-Koutecky van Lenthe, J. Kendrick, K. Schoffel & P. Sherwood, with contributions from R.D., W. von Niessen, R.J. Harrison, A.P. Rendell, V.R. Saunders, A.J. Stone and D. Tozer.
GAMESS-UK
• GAMESS-UK is the general purpose ab initio molecular electronic structure program for performing SCF-, MCSCF- and DFT-gradient calculations, together with a variety of techniques for post Hartree Fock calculations.
– The program is derived from the original GAMESS code, obtained from Michel Dupuis in 1981 (then at the National Resource for Computational Chemistry, NRCC), and has been extensively modified and enhanced over the past decade.
– This work has included contributions from numerous authors†, and has been conducted largely at the CCLRC Daresbury Laboratory, under the auspices of the UK's Collaborative Computational Project No. 1 (CCP1). Other major sources that have assisted in the on-going development and support of the program include various academic funding agencies in the Netherlands, and ICI plc.
• Additional information on the code may be found from links at:http://www.dl.ac.uk/CFS
Royal Institution, 6th November 200350HPCx/Biology Discussions
GAMESS-UK features 1.
– Hartree Fock: • Segmented/ GC + spherical harmonic basis sets • SCF-Energies and Gradients: conventional, in-core, direct• SCF-Frequencies: numerical and analytic 2nd derivatives • Restricted, unrestricted open shell SCF and GVB.
– Density Functional Theory • Energies + gradients, conventional and direct including Dunlap fit• B3LYP, BLYP, BP86, B97, HCTH, B97-1, FT97 & LDA functionals • Numerical 2nd derivatives (analytic implementation in testing)
– Electron Correlation: • MP2 energies, gradients and frequencies, Multi-reference MP2, MP3 Energies • MCSCF and CASSCF Energies, gradients and numerical 2nd derivatives • MR-DCI Energies, properties and transition moments (semi-direct module)• CCSD and CCSD(T) Energies • RPA (direct) and MCLR excitation energies / oscillator strengths, RPA gradients• Full-CI Energies • Green's functions calculations of IPs. • Valence bond (Turtle)
Royal Institution, 6th November 200351HPCx/Biology Discussions
GAMESS-UK features 2.
– Molecular Properties: • Mulliken and Lowdin population analysis, Electrostatic Potential-Derived Charges • Distributed Multipole Analysis, Morokuma Analysis, Multipole Moments • Natural Bond Orbital (NBO) + Bader Analysis • IR and Raman Intensities, Polarizabilities & Hyperpolarizabilities • Solvation and Embedding Effects (DRF)• Relativistic Effects (ZORA)
– Pseudopotentials: • Local and non-local ECPs.
– Visualisation: tools include CCP1 GUI– Hybrid QM/MM (ChemShell + CHARMM QM/MM) – Semi-empirical : MNDO, AM1, and PM3 hamiltonians – Parallel Capabilities:
• MPP and SMP implementations (GA tools) • SCF/DFT energies, gradients, frequencies• MP2 energies and gradients• Direct RPA
Royal Institution, 6th November 200352HPCx/Biology Discussions
Parallel Implementation of GAMESS-UK
• Extensive use of Global Array (GA) Tools and Parallel Linear Algebra from NWChem Project (EMSL)
• SCF and DFT– Replicated data, but …– GA Tools for caching of I/O for restart and checkpoint files– Storage of 2-centre 2-e integrals in DFT Jfit – Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix)
• SCF and DFT second derivatives– Distribution of <vvoo> and <vovo> integrals via GAs
• MP2 gradients– Distribution of <vvoo> and <vovo> integrals via Gas
• Direct RPA Excited States– Replicated data with parallelisation of direct integral evaluation
Royal Institution, 6th November 200353HPCx/Biology Discussions
104
92
102
32
64
96
128
32 64 96 128
LinearSGI Origin 3800/R14k-500IBM SP/Regatta-HAlphaServer ES45/1000
4731
26142504
2838
16811584
1867
12811100
0
1250
2500
3750
5000
32 64 128
SGI Origin 3800/R14k-500IBM SP/Regatta-H
AlphaServer ES45/1000
81
32
64
96
128
32 64 96 128
LinearSGI Origin 3800/R14k-500IBM SP/Regatta-HAlphaServer ES45/1000
GAMESS-UK: DFT Calculations
Number of CPUs
Number of CPUs
Elapsed Time (seconds)
Valinomycin (DFT HCTH):Valinomycin (DFT HCTH):Basis: DZVP2_A2 (Dgauss)Basis: DZVP2_A2 (Dgauss)(1620 GTOs)(1620 GTOs)
Speedup
Cyclosporin (DFT B3LYP):Cyclosporin (DFT B3LYP):Basis: 6-31G* Basis: 6-31G* (1855 GTOs)(1855 GTOs)
11053
55575823 5846
31093081 3388
19401825
0
3000
6000
9000
12000
32 64 128
SGI Origin 3800/R14k-500
IBM SP/Regatta-H
AlphaServer ES45/1000
Royal Institution, 6th November 200354HPCx/Biology Discussions
DFT Analytic 2nd Derivatives PerformanceIBM SP/p690, HP/Compaq SC ES45/1000 and SGI O3800
(C6H4(CF3))2: Basis 6-31G (196 GTO)
Elapsed Time (seconds)
Terms from MO 2e-integrals in GA storage (CPHF & pert. Fock matrices); Calculation dominated by CPHF:
CPUs
1614
989
743
1937
1073
569470
1175
354 307
0
500
1000
1500
2000
2500
3000
32 64 128
CS14 PIII/1000 + Myrinet (1 CPU)SGI Origin3800/R14k-500 - B3LYPIBM SP/p690 - B3LYPIBM SP/p690 - HCTHAlphaServer ES45/1000 - B3LYPAlphaServer ES45/1000 - HCTH
Royal Institution, 6th November 200355HPCx/Biology Discussions
CHARMM
• CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a general purpose molecular mechanics, molecular dynamics and vibrational analysis package for modelling and simulation of the structure and behaviour of macromolecular systems (proteins, nucleic acids, lipids etc.)
• Supports energy minimisation and MD approaches using a classical parameterised force field.
• J. Comp. Chem. 4 (1983) 187-217
• Parallel Benchmark - MD Calculation of Carboxy Myoglobin (MbCO) with 3830 Water Molecules.
• QM/MM model for study of reacting species– incorporate the QM energy as part of the system into the force
field– coupling between GAMESS-UK (QM) and CHARMM.
Royal Institution, 6th November 200356HPCx/Biology Discussions
8
16
24
32
40
48
56
64
8 16 24 32 40 48 56 64
LinearCS1 PIII/450 + FE/LAMCS2 QSNet Alpha Cluster/667CS10 P4/2666 + MyrinetCray T3E/1200ESGI Origin 3800/R14k-500
62
104
64
5159
83
3744
69
6466
114
73
61
89
4654
72
0
50
100
150
16 32 64
CS2 QSNet Alpha Cluster/667CS9 P4/2000 + Myrinet 2kCS12 P4/2400 + Gbit EtherCS10 P4/2666 + MyrinetSGI Origin 3800/R14k-500AlphaServer SC ES45/1000IBM SP/p690
Parallel CHARMM Benchmark
Benchmark MD Calculation of Carboxy Myoglobin Benchmark MD Calculation of Carboxy Myoglobin (MbCO) with 3830 Water Molecules: (MbCO) with 3830 Water Molecules: 14026 atoms, 1000 14026 atoms, 1000 steps (1 ps), 12-14 A shift.steps (1 ps), 12-14 A shift.
Royal Institution, 6th November 200358HPCx/Biology Discussions
• QM region 35 atoms (DFT BLYP) – include residues with possible proton donor/acceptor roles – GAMESS-UK, MNDO, TURBOMOLE
• MM region (4,180 atoms + 2 link)– CHARMM force-field, implemented in CHARMM, DL_POLY
QM/MM Applications
Triosephosphate isomerase (TIM)
• Central reaction in glycolysis, catalytic interconversion ofDHAP to GAP• Demonstration case within QUASI (Partners UZH, and BASF)
Triosephosphate isomerase (TIM)
• Central reaction in glycolysis, catalytic interconversion ofDHAP to GAP• Demonstration case within QUASI (Partners UZH, and BASF)
Measured Time (seconds)
T T 128128 (IBM SP/Regatta-H) = 143 secs (IBM SP/Regatta-H) = 143 secs
1030
1487
714 797
540
778
419431
308
428
274246
196
257213
170
0
400
800
1200
1600
8 16 32 64
CS9 P4/2000 + Myrinet 2k
SGI Origin3800/R14k-500
AlphaServer SC ES45/1000
IBM SP/Regatta-H
Number of CPUs
Royal Institution, 6th November 200359HPCx/Biology Discussions
– Multiple independent simulations
– Replica exchange - Monte Carlo exchange of configurations between an ensemble of replicas at different temperatures
– Combinatorial approach to ligand binding
– Replica path method - simultaneously optimise a series of points defining a reaction path or conformational change, subject to path constraints.• Suitable for QM and QM/MM Hamiltonians• Parallelisation per point • Communication is limited to
adjacent points on the path - global sum of energy function
PP3636
PP44
PP3232 PP3333
PP11PP00
PP3434 PP3535
PP33PP22
EE
Reaction Co-ordinateReaction Co-ordinate
Collaboration with Bernie Brooks (NIH) Collaboration with Bernie Brooks (NIH) http://www.cse.clrc.ac.uk/qcg/chmgukhttp://www.cse.clrc.ac.uk/qcg/chmguk
Sampling Methods
Royal Institution, 6th November 200360HPCx/Biology Discussions
Summary
• Many of the codes used by the community have quite poor scaling
• Best cases– large quantum calculations (Crystal, DFT etc)– very large MD simulations (NAMD)
• For credible consortium bid we need to focus on applications which have– acceptable scaling now (perhaps involving migration to new codes
(e.g. NAMD)– heavy CPU or memory demands (e.g. CRYSTAL).– potential for algorithmic development to exploit 1000s of
processors (e.g. pathway optimisation, Monte Carlo etc)